The present invention relates to global tracking systems and cloud systems thereof. Specifically, the present invention relates to global tracking systems and cloud systems thereof that re-identify cross-boundary objects and joining objects in terms of geographical coordinates.
In today's security industry, images captured by image capturing devices are the important elements for many security-critical applications. Recent development of Artificial Intelligence (AI) and edge computing plays a critical role to capture single-sourced sensory events (e.g., detecting an unattended suitcase with an image capturing device), which, nevertheless, tell partial and probably imprecise information, rather than the whole truth. Putting together multiple streams of events for cross examination is the way to detect and figure out what actually happened on the field.
One of the key issues immediately following the detection of a meaningful event is to track and trace the moving objects related to the event in a vast area. The problem is referred to as global tracking (or extensive tracking), to differ from the traditional tracking problem that associates an object from one frame to another in one single image capturing device. The algorithm that re-identifies cross-boundary objects in global tracking problem is referred to global re-identification (or extensive re-identification), which emphasizes the issues related to the complexity behind a great number of image capturing devices. The goal of global tracking is to project the trajectory of every object moving within a surveillance area in an accurate and efficient way.
For global re-identification, the challenge becomes severe when there are blind areas uncovered by the image capturing device and/or when the number of image capturing devices increases substantially, which impose difficulties in achieving high accuracy while maintaining real-time performance. A real-world example is to monitor thousands of passengers passing under hundreds of image capturing devices during rush hours in a bus station. In such an extensive environment, the key issue leading to inaccuracy is the large number of candidates to be screened by feature comparison as well as some objects may enter into or exit from blind areas.
Consequently, a global tracking technology that can re-identify cross-boundary objects efficiently and effectively and handle the situations that objects entering into and exiting from blind areas is in an urgent need.
An objective of the present invention is to provide a global tracking system, which may be deployed in a space. The global tracking system comprises a plurality of image capturing devices and at least one processor. Each of the image capturing devices sees a pre-designated view area in the space, and the image capturing devices leave out at least one blind area uncovered by the image capturing devices in the space. The at least one processor is configured to receive a sequence of images from each of the image capturing devices, detect a plurality of object instances, and generate an object record for each of the object instances. Each of the object instances is detected from one of the images. Each of the object records comprises a tracking identity, a timestamp, and a geographical coordinate where the corresponding object located, and the object records with the same tracking identity correspond to the same object and the same image capturing device.
The at least one processor is configured to determine that a specific object among the objects has no object record within a pre-determined time interval, find a specific object record corresponding to a last appearance of the specific object from the object records, and project that the specific object is entering into a specific blind area of the at least one blind area in the pre-determined time interval according to the geographical coordinate comprised in the specific object record. A distance between the geographical coordinate comprised in the specific object record and a boundary of the specific blind area is within a pre-determined range.
Another objective of the present invention is to provide a global tracking system, which may be deployed in a space. The global tracking system comprises a plurality of image capturing devices and at least one processor. Each of the image capturing devices sees a pre-designated view area in the space, and the image capturing devices leave out at least one blind area uncovered by the image capturing devices in the space. The at least one processor is configured to receive a sequence of images from each of the image capturing devices, detect a plurality of object instances, and generate an object record for each of the object instances. Each of the object instances is detected from one of the images. Each of the object instances corresponds to an object record, and each of the object records comprises a tracking identity, a timestamp, and a geographical coordinate where the corresponding object located. The object records with the same tracking identity correspond to the same object and the same image capturing device.
The at least one processor is configured to detect a first appearance of a specific object viewed by a specific image capturing device, and the first appearance happened at a first time instant and at a first geographical coordinate. The at least one processor is configured to deduce at least one candidate object that is entering and has not exited from a specific blind area due to having no object record generated after entering the specific blind area according to a model based on the first time instant, the first geographical coordinate, and a second geographical coordinate and a second time instant of each candidate object record. The specific blind area is neighbor to the pre-designated view area corresponding to the specific image capturing device. The at least one processor is configured to determine that the specific object is one of the at least one candidate object by calculating a similarity between the specific object and each candidate object and determine that the specific object is exiting the specific blind area.
Yet another objective of the present invention is to provide a cloud system, which may be set up to cooperate with a plurality of existing image capturing devices. Each of the image capturing devices is configured to see a pre-designated view area in a space, and the image capturing devices leaves out at least one blind area uncovered by the image capturing devices in the space. The cloud system comprises a transceiving interface and at least one processor, wherein the at least one processor is electrically connected to the transceiving interface. The transceiving interface is configured to receive a sequence of images from each of the image capturing devices. The at least one processor is configured to detect a plurality of object instances and generate an object record for each of the object instances. Each of the object instances is detected from one of the images. Each of the object records comprises a tracking identity, a timestamp, and a geographical coordinate where the corresponding object located, and the object records with the same tracking identity correspond to the same object and the same image capturing device.
The at least one processor is configured to determine that a specific object among the objects has no object record within a pre-determined time interval, find a specific object record corresponding to a last appearance of the specific object from the object records, and project that the specific object is entering into a specific blind area of the at least one blind area in the pre-determined time interval according to the geographical coordinate comprised in the specific object record. A distance between the geographical coordinate comprised in the specific object record and a boundary of the specific blind area is within a pre-determined range.
A further objective of the present invention is to provide a cloud system, which may be set up to cooperate with a plurality of existing image capturing devices. Each of the image capturing devices is configured to see a pre-designated view area in a space, and the image capturing devices leaving out at least one blind area uncovered by the image capturing devices in the space. The cloud system comprises a transceiving interface and at least one processor, wherein the at least one processor is electrically connected to the transceiving interface. The transceiving interface is configured to receive a sequence of images from each of the image capturing devices. The at least one processor is configured to detect a plurality of object instances and generate an object record for each of the object instances. Each of the object instances is detected from one of the images, and each of the object instances corresponds to an object record. Each of the object records comprises a tracking identity, a timestamp, and a geographical coordinate where the corresponding object located, wherein the object records with the same tracking identity correspond to the same object and the same image capturing device.
The at least one processor is configured to detect a first appearance of a specific object viewed by a specific image capturing device, and the first appearance happened at a first time instant and at a first geographical coordinate. The at least one processor is configured to deduce at least one candidate object that is entering and has not exited from a specific blind area due to having no object record generated after entering the specific blind area according to a model based on the first time instant, the first geographical coordinate, and a second geographical coordinate and a second time instant of each candidate object record. The specific blind area is neighbor to the pre-designated view area corresponding to the specific image capturing device. The at least one processor is configured to determine that the specific object is one of the at least one candidate object by calculating a similarity between the specific object and each candidate object and determine that the specific object is exiting the specific blind area.
The global tracking systems and the cloud systems thereof provided by the present invention adopt a spatial-temporal awareness approach to track every moving object within the space. The global tracking systems and the cloud systems thereof have the ability of re-identifying cross-boundary objects and the ability of joining objects in the overlapped view areas in terms of geographical coordinates. The core idea of the invention is to adopt a blind area model to significantly narrow down the number of the candidates before a feature-based re-identification algorithm is applied. The global tracking systems and the cloud systems thereof extend beyond operations of security, safety, and friendly community and can be potentially deployed to transform passengers' experience in aviation hub, commuters' experience in transport hub, consumers' experience in retail malls, patient's experience in hospitals and finally users' experience in precincts.
The detailed technology and preferred embodiments implemented for the subject invention are described in the following paragraphs accompanying the appended drawings for people skilled in this field to well appreciate the features of the claimed invention.
In the following description, global tracking systems and cloud systems thereof provided according to the present invention will be explained with reference to embodiments thereof. However, these embodiments of the present invention are not intended to limit the present invention to any specific environment, applications, or implementations described in these embodiments. Therefore, description of these embodiments is only for purpose of illustration rather than to limit the scope of the present invention. It shall be appreciated that, in the following embodiments and the attached drawings, elements unrelated to the present invention are omitted from depiction. In addition, dimensions of elements and dimensional proportions among individual elements in the attached drawings are provided only for ease of depiction and illustration, but not to limit the scope of the present invention.
An embodiment of the present invention is a global tracking system 1, and a schematic view of which is illustrated in
The image capturing devices C1, C2, C3, C4 are deployed in a space S, which may be a vast public area such as an airport, a bus station, or the like. It is assumed that each of the image capturing devices C1, C2, C3, C4 is individually fixed at a specific location, with potential parameters such as angle, height, and many camera-specific parameters. It is possible that one of or more of the image capturing devices C1, C2, C3, C4 is/are equipped on a robot (or robots), and the robot(s) stand(s) at a specific location.
The space S may be gridded into a finite set of rectangular cells (e.g. comprising a latitude and a longitude rounded up to the nth decimal place) as shown in
A mapping function is defined between a pre-designated view area (i.e. each of the pre-designated view areas V1, V2, V3, V4) and an image captured by the corresponding image capturing device. With the mapping function, every pixel within an image captured by the image capturing device can be mapped to a geographical coordinate covered by the corresponding pre-designated view area. Taking the image capturing device C1 and the corresponding pre-designated view area V1 as an example, every pixel within an image captured by the image capturing device C1 can be mapped to one of the geographical coordinates covered by the pre-designated view area V1 according to the corresponding mapping function. Therefore, if an object instance is detected from an image captured by the image capturing device C1, the geographical coordinate(s) that the object located in the pre-designated view area V1 can be derived by the mapping function and the pixels where the object instance is detected (e.g. inputting the positions of the pixels into the mapping function).
In this embodiment, the global tracking system 1 further comprises a processor 11, and the processor 11 is in charge of all the operations in this embodiment. In some other embodiments, the global tracking system 1 may comprise more than one processor, wherein some processor(s) is/are in charge of detecting object instances from images and generating object records for the object instances and other processor(s) is/are in charge of re-identifying cross-boundary objects (will be described in details later) and/or joining objects in terms of geographical coordinates (will be described in details later). In some other embodiments, the at least one processor may be deployed in a cloud system as shown in
Each of the aforesaid processors (e.g. the processor 11) may be one of various processors, Graphics Processing Unit (GPU), central processing units (CPUs), microprocessor units (MPUs), digital signal processors (DSPs), or other computing apparatuses well-known to a person having ordinary skill in the art. The transceiving interface 13 may be a wired transmission interface or a wireless transmission interface known to a person having ordinary skill in the art, which is used to be connected to a network (e.g., an Internet, a local area network) and may receive and transmit signals and data on the network.
In this embodiment, the processor 11 receives sequences S1, S2, S3, S4 of images from the image capturing devices C1, C2, C3, C4 respectively. Every time the processor 11 receives an image (in either of the sequences S1, S2, S3, S4), the processor 11 tries to detect object instance(s) therefrom by an object detection algorithm (e.g. Yolo) and a tracking algorithm (e.g., Deep SORT). If an object instance is detected, the processor 11 generates an object record for that object instance. Each of the object records comprises a tracking identity generated by a local tracking algorithm, a timestamp (e.g. the time instant that the image is captured), and the geographical coordinate where the corresponding object located in the space S. The aforesaid tracking identities are locally unique to the corresponding image capturing device; that is, the object records with the same tracking identity correspond to the same object and the same image capturing device.
The global tracking system 1 may comprise a working memory space WM for storing the object records. In some embodiments, the working memory space WM retains object records for every object seen over the past period (e.g. 120 seconds), and object records that live longer than the past period will be immediately deleted from the working memory space WM. The working memory space WM is electrically connected to the processor 11 and may be one of a Random-Access Memory (RAM), a non-volatile memory, an HDD, or any other non-transitory storage media or apparatuses with the same function and well-known to a person having ordinary skill in the art.
In this embodiment, the global tracking system 1 can track every moving object within the space S due to having the ability of re-identifying cross-boundary objects and the ability of joining objects in the overlapped view areas in terms of geographical coordinates. The technique regarding re-identification of cross-boundary objects is further divided into two situations, including (a) tracking an object that is entering into (or considered as disappearing into) a blind area and (b) tracking an object that is exiting from a blind area. All of them will be given in details below.
Herein, the operations regarding tracking an object that is entering into a blind area will be described in details with reference to a specific example shown in
In this specific example, a specific object O1 is moving in the pre-designated view area V3 and is captured by the image capturing device C3 in five images. The processor 11 detects an object instance from each of the five images, identifies that the object instances correspond to the same specific object O1 (by a local tracking algorithm), and generates an object record for each of the detected object instances. Specifically, five object records R1, R2, R3, R4, R5 corresponding to the specific object O1 are generated. Since the processor 11 identifies that the object instances correspond to the same specific object O1, the five object records R1, R2, R3, R4, R5 comprises the same tracking identity ID1. Furthermore, the five object records R1, R2, R3, R4, R5 respectively comprise the timestamps T1, T2, T3, T4, T5 (e.g. the time instants that the corresponding images are captured) and respectively comprise the geographical coordinates S1, S2, S3, S4, S5 (i.e. where the specific object O1 is located in the pre-designated view area V3 of the space S when the corresponding image is captured) .
At some point, the processor 11 determines that the specific object O1 has no object record within a pre-determined time interval (e.g. 0.437 second), which means that the processor 11 loses the track of the specific object O1. The processor 11 tries to find a specific object record corresponding to a last appearance of the specific object O1 from the object records that have been generated and collected (which may be stored in the working memory space WM). In the specific example shown in
In some embodiments, the processor 11 may refer to a probability model (e.g. a probability model based on Poisson distribution) in order to determine whether to project the specific object O1 having entered into the specific blind area B2 in the pre-determined time interval. In those embodiments, the probability model generates a probability based on a last-appeared time of the specific object O1 (i.e. the timestamp recorded in the specific object record R5, which corresponds to the last appearance of the specific object O1) and the distance between the geographical coordinate comprised in the specific object record R5 and the boundary of the specific blind area B2. If the probability is greater than a predetermined threshold, the processor 11 projects that the specific object O1 is entering into the specific blind area B2 in the pre-determined time interval.
Herein, the operations regarding tracking an object that is exiting from a blind area will be described in details. Briefly speaking, once an object shows up in one of the pre-designated view areas V1, V2, V3, V4, the processor 11 projects the movement of the object coming from the pre-designated view areas adjacent to the blind area. The processor 11 looks for potential candidates in the working memory space WM, which retains object records for every object seen over the past period (e.g. 120 seconds). The size of the initial set of candidate objects is potentially large. The processor 11 then substantially narrow down the potential candidate objects to a very limited set (e.g. two or three). Once candidates are narrowed down, the feature-based model is applied to re-identify the object (i.e., find out from which pre-designated view area the person comes).
For comprehension, a specific example shown in
As it is the first appearance of the specific object O3 viewed by the specific image capturing device C4, the processor 11 further determines whether the specific object O3 exits from the specific blind area B2 neighboring to the pre-designated view area V4. In the specific example shown in
In some embodiments, the processor 11 may deduce the two candidate objects O1, O2 from the objects retained in the working memory WM with reference to a probability model (e.g. a hidden Markov Chain model). For each of the candidate objects O1, O2, the probability model generates a probability based on the geographical coordinate of the first appearance of the specific object O3, the geographical coordinate corresponding to the candidate object, and a time length between the time instant of the first appearance of the specific object O3 and the time instant corresponding to the candidate object. Please note that the geographical coordinate S15 and the time instant of the first appearance of the specific object O3 is recorded in the object record R15, the geographical coordinate and the time instant of candidate object O1 is recorded in the object record R5, and the geographical coordinate and the time instant of candidate object O2 is recorded in the object record R9. The candidate objects O1, O2 are chosen to be the candidates of the specific object O3 because the corresponding probabilities are greater than another predetermined threshold.
After deriving the candidate objects O1, O2, the processor 11 calculates a similarity between the specific object O3 and each of the candidate objects O1, O2 in a feature-based manner. If the largest similarity among the similarities is greater than a predetermined threshold, the processor 11 determines that the specific object O3 is the candidate object corresponding to the largest similarity and determines that the specific object O3 is exiting the specific blind area B2. For convenience, it is assumed that the similarity between the specific object O3 and the candidate objects O2 is the largest one and is greater than the predetermined threshold. Thus, the processor 11 considers that the specific object O3 and the candidate objects O2 are the same object, and the candidate objects O2 is the “predecessor” of the specific object O3. It can be considered as the specific object O3 is entering into the specific blind area B2 from the pre-designated view area V3, has traversed through the specific blind area B2, and is exiting from the specific blind area B2 into pre-designated view area V4.
The aforesaid probability model (may be referred to as “a blind area model”) for deducing candidate object(s) is elaborated herein. The blind area model is used to precisely project the trajectories of the objects moving throughout the vast space. A blind area is mathematically a bag (denoted as “B”), that an object can be added into (i.e., enter into the blind area) and removed from (i.e., exit from the blind area) at a later time. The blind area model is aimed to address three issues as follows:
1. To project when and from which geographical coordinate an object enters into a blind area (i.e. entering the bag “B”);
2. To detect when and from which geographical coordinate an object (denoted as “o”) exits from the bag “B”; and
3. To deduce a limited subset of candidates from the bag “B” for feature matching with the object “o”.
Remember that all objects are anonymous and can be only seen from the pre-designated view areas. By the definition of blindness, there is no way to know the real status of objects in a blind area. The first issue is addressed by determining whether the appearance of an object is the last one seen in a view area, moving near and toward a blind area adjacent to the view area. The last appearance can only be estimated probabilistically with a no-show event after a short period of silence, even though it is not sufficient to prove that the object is indeed entering into a blind area. For an object tracked by the local tracking algorithm, losing track does not mean disappearing into a blind area. It may lose track temporarily and re-appear in a few frames. A mathematical model is trained to determine the probability of entering into a blind area.
The second issue is addressed by detecting the first appearance of an object “o” in a view area, which is geo-spatially located very close to a blind area. The detection of the first appearance does not prove the object “o” exiting from a blind area if its lineage cannot be traced, that is, its origin, what happens to it and where it moves over time. This leads to the next question—where does the object “o” come from? That is, to re-identify the object “o”.
In an extensive environment, one of the critical factors for effective re-identification is the number of candidates selected for feature matching. Reducing the potential candidates from dozens to a few will significantly increase the overall accuracy of extensive tracking. With the prior knowledge of the blind areas, the goal can be achieved by projecting the probable trails of objects inside the blind area and therefore filtering out those spatially and temporally impossible. Finally, the feature comparison algorithm is applied to re-identify o out of the potential candidates. That is, linking o with the most likely one that disappeared into the blind area a few moments earlier. Effective re-identification assures both accuracy and performance.
A hidden Markov Chain model may be adopted to deduce the potential candidates out of all possible objects that entered into a blind area. With data collected on the field, the model is trained to determine the probability distribution, P(x, y, t), where the parameter t is the duration between the time entering at the geographical coordinate x and the time exiting at the geographical coordinate y. With the model, when a first-time object that shows up at the geographical coordinate y from a blind area, all potential candidates that disappeared into the blind area in a reasonable time range from possible geographical coordinate x's are selected and matched for re-identification.
Herein, the operations regarding joining objects in the overlapped view areas in terms of geographical coordinates (i.e. tracking an object in an overlapped view area) will be described in details. For the situation of overlapped view areas, “joining objects in terms of geographical coordinates” associates one object viewed by one image capturing device with another according to the exclusion principle, which states that two distinct persons cannot show up at the same location in the same time, and therefore two instances with the same geohashes can be considered as the same person.
A specific example is given in
Specifically, the object record R17 corresponds to the image capturing device C1 (i.e. the object record R17 is generated based on an object instance detected from an image captured by the image capturing device C1). The object record R17 comprises the tracking identity ID7, the timestamp T20, and the geographical coordinate S20. In addition, the object record R20 corresponds to the image capturing device C3 (i.e. the object record R20 is generated based on an object instance detected from an image captured by the image capturing device C3). The object record R20 comprises the tracking identity ID10, the timestamp T25, and the geographical coordinate S25).
The processor 11 determines that a distance between the geographical coordinate S20 comprised in the object record R17 and the geographical coordinate S25 comprised in the object record R20 is smaller than a first threshold and determines that a time difference between the timestamp T20 comprised in the object record R17 and the timestamp T25 comprised in the object record R20 is smaller than a second threshold. Based on the aforesaid determinations, the processor 11 considers that the first object instance corresponding to the object record R17 and the object instance corresponding to the object record R20 are overlapped and the object O5 corresponding to the object record R17 and the object O7 corresponding to the object record R20 are the same object in the real world (i.e. in the space S). This is based on the exclusion principle where two distinct persons cannot show up in the same location at the same time. Thus, the processor 11 adjusts the geographical coordinate comprised in the object record R17 to be the same as the geographical coordinate comprised in the object record R20.
In some embodiments, the processor 11 may further adjust the mapping function corresponding to the image capturing device C1 according to the distance between the geographical coordinate S20 comprised in the object record R17 and the geographical coordinate S25 comprised in the object record R20.
The convergence of digital capabilities across the digital and physical environment presents great opportunities for security industry to embrace a new frontier of digital transformation as the society continues to evolve with the rapidly shifting technological landscape. To address the leading physical security operation challenges of reactive threat management and intuition-led decision-making based on subjectivity, the security industry should adopt an operational design first approach, beyond technology and system implementation.
The global tracking system 1 adopts a spatial-temporal awareness approach to track every moving object within the space. The global tracking system 1 has the ability of re-identifying cross-boundary objects and the ability of joining objects in the overlapped view areas in terms of geographical coordinates. The core idea of the invention is to adopt a blind area model to significantly narrow down the number of the candidates before a feature-based re-identification algorithm is applied. The global tracking system 1 extends beyond operations of security, safety, and friendly community and can be potentially deployed to transform passengers' experience in aviation hub, commuters' experience in transport hub, consumers' experience in retail malls, patient's experience in hospitals and finally users' experience in precincts.
The above disclosure is related to the detailed technical contents and inventive features thereof. A person having ordinary skill in the art may proceed with a variety of modifications and replacements based on the disclosures and suggestions of the invention as described without departing from the characteristics thereof. Nevertheless, although such modifications and replacements are not fully disclosed in the above descriptions, they have substantially been covered in the following claims as appended.