This application claims priority based on Japanese Patent Application No. 2021-075295 (filed Apr. 27, 2021), the entire disclosure of which is hereby incorporated by reference.
The present disclosure relates to an object tracking device.
A known technology detects a nearby object and performs tracking to predict the motion of the detected object. For example, Patent Literature 1 discloses a device that detects the presence or absence of approaching vehicles and pedestrians by processing an image signal. The image signal is outputted from a vehicle-mounted camera that captures images around the vehicle. The device displays approaching vehicles and pedestrians, each being marked with a rectangular frame.
In one embodiment, an object tracking device includes an input interface, a processor, and an output interface. The input interface is configured to acquire sensor data. The processor is configured to detect a detection target from the sensor data and track the detection target using Kalman filters associated with each of the detection target and an observed value. The output interface is configured to output a detection result regarding the detection target. The processor is configured to impose a limit on a range of variation in an index of the Kalman filters that influences tracking of the detection target.
Hereinafter, an embodiment of the present disclosure will be described with reference to the drawings. The drawings used in the following description are schematic. The dimensional proportions and the like in the drawings do not necessarily match the real proportions.
In the present embodiment, the object tracking device 20 acquires a moving image from the imaging device 10 as sensor data. That is, in the present embodiment, the sensor used to detect a detection target is an image sensor 12. The image sensor 12 is provided in the imaging device 10 and captures an image of visible light. However, the object tracking system 1 is not limited to the configuration illustrated in
In the present embodiment, the object tracking system 1 is mounted to a moving body that moves. The detection target is an object 40 (see
As illustrated in
The imaging device 10 includes an imaging optical system 11, an image sensor 12, and a processor 13.
The imaging device 10 may be installed at any of various positions in or on the vehicle 100. The imaging device 10 includes, but is not limited to, a front camera, a left side camera, a right side camera, a rear camera, and the like. The front camera, left side camera, right side camera, and rear camera are installed in or on the vehicle 100 to enable image capture of the area surrounding the vehicle 100 in front, on the left side, on the right side, and behind, respectively. In the embodiment described as one example hereinafter, as illustrated in
The imaging optical system 11 may include one or more lenses. The image sensor 12 may include a charge-coupled device (CCD) image sensor or a complementary MOS (CMOS) image sensor.
The imaging optical system 11 is configured to form an image of an object (subject image) on an imaging surface of the image sensor 12. The image sensor 12 is configured to convert the image of the object into an electrical signal. The image sensor 12 can capture a moving image at a predetermined frame rate. A frame is each still image included in a moving image. The frame rate refers to the number of images that can be captured every second. The frame rate may be 60 frames per second (fps) or 30 fps, for example.
The processor 13 is configured to control the imaging device 10 as a whole and execute various types of image processing on a moving image outputted from the image sensor 12. The image processing to be performed by the processor 13 may include any processing, such as distortion correction, brightness adjustment, contrast adjustment, and gamma correction.
The processor 13 may include one or more processors. For example, the processor 13 includes one or more circuits or units configured to perform one or more data computation procedures or processes by executing instructions stored in an associated memory. The processor 13 includes at least one processor, microprocessor, microcontroller, application-specific integrated circuit (ASIC), digital signal processor (DSP), programmable logic device (PLD), or field-programmable gate array (FPGA), any combination of these devices or configurations, or any combination of other known devices or configurations.
The object tracking device 20 includes an input interface 21, storage 22, a processor 23, and an output interface 24.
The input interface 21 is configured to communicate with the imaging device 10 through a wired or wireless means of communication. The input interface 21 acquires a moving image from the imaging device 10 as sensor data. The input interface 21 may correspond to the transmission scheme of an image signal that the imaging device 10 transmits. The input interface 21 may also be referred to as an inputter or acquirer. The imaging device 10 and the input interface 21 may be connected by an in-vehicle communication network, such as a control area network (CAN).
The storage 22 is a storage device storing data and a program required for processing to be performed by the processor 23. For example, the storage 22 temporarily stores a moving image acquired from the imaging device 10. For example, the storage 22 stores data generated by processing performed by the processor 23. One or more from among semiconductor memory, magnetic memory, and optical memory, for example, may be used to form the storage 22. The semiconductor memory may include volatile memory and non-volatile memory. The magnetic memory may include a hard disk and magnetic tape, for example. The optical memory may include a Compact Disc (CD), a Digital Versatile Disc (DVD), and a Blu-ray® Disc (BD), for example.
The processor 23 controls the object tracking device 20 as a whole. The processor 23 recognizes an image of an object included in a moving image acquired through the input interface 21. The processor 23 maps the coordinates of the recognized image of the object to the coordinates of the object 40 in a virtual space 46 (see
The processor 23 detects a detection target from a moving image, and tracks the detection target using Kalman filters. The processor 23 is capable of detecting a plurality of detection targets from a moving image and tracking each of the plurality of detection targets using Kalman filters. In the technology of the related art, if a plurality of detection targets with overlapping images are detected in a moving image, tracking error occurs or the tracking accuracy is lowered. In the present embodiment, the processor 23 can avoid such problems by associating one or more Kalman filters to each of a plurality of detection targets. The processor 23 manages observed values, Kalman filters, and identification information (hereinafter “tracked object ID”) unique to each tracked object in layers (tiers). The processor 23 determines whether tracked objects are the same object (same detection target) and executes a process of associating the observed values, Kalman filters, and tracked object IDs. Thus, the accuracy of tracking a plurality of detection targets can be improved further.
The processor 23 is configured to impose a limit on a range of variation in an index of the Kalman filters that influences tracking of the detection target. In the present embodiment, the processor 23 imposes an upper limit and/or a lower limit on an index including at least one selected from the group consisting of the Mahalanobis distance used to associate Kalman filters with an observed value, the radius of a grouping area used to associate Kalman filters with a detection target, and the size of the error ellipse of a Kalman filter. Appropriately limiting the range of variation in an index of the Kalman filters enables consistent tracking that is resistant to intermittent loss of association with a detection target and observed value. Details of the processing to be performed by the processor 23 will be described later. Like the processor 13 of the imaging device 10, the processor 23 may include a plurality of processors Like the processor 13, the processor 23 may be a combination of multiple types of devices.
The output interface 24 is configured to output an output signal from the object tracking device 20. The output interface 24 may also be referred to as an outputter. The output interface 24 may output a detection result regarding a detection target. The detection result may be the coordinates of the point mass 45, for example.
The output interface 24 may include a physical connector and a wireless communicator. The output interface 24 may be connected to a CAN or other network of the vehicle 100, for example. The output interface 24 may be connected to the display 30, a control device and alarm device of the vehicle 100, and the like through a CAN or other communication network. Information outputted from the output interface 24 may be used, as appropriate, by each of the display 30, the control device, and the alarm device.
The display 30 may display a moving image outputted from the object tracking device 20. The display 30 may receive, from the object tracking device 20, the coordinates of the point mass 45 representing the position of the image of an object. In this case, the display 30 may have a function for generating an image element (for example, a warning to be displayed together with an approaching object) in accordance with the received coordinates, and overlaying the generated image element onto the moving image. Any of various types of devices may be adopted as the display 30. For example, a liquid crystal display (LCD), an organic electro-luminescence (EL) display, an inorganic EL display, a plasma display panel (PDP), a field emission display (FED), an electrophoretic display, or a twisting ball display may be adopted as the display 30.
The flowchart in
The flowchart in
The processor 23 acquires each frame of a moving image from the imaging device 10 through the input interface 21 (step S101).
The processor 23 uses image recognize to recognize the object image 42 from each frame of the moving image (step S102). The method of recognizing the object image 42 includes any of various known methods. For example, the method of recognizing the object image 42 includes a method based on recognizing the shapes of objects such as cars and people, a method based on template matching, a method of calculating features from an image and using the features for matching, or the like. Features can be calculated by using a function approximator that can be trained to learn input-output relationships. As an example, a neural network can be used as the function approximator that can be trained to learn input-output relationships.
The processor 23 maps the coordinates (u, v) of the object image 42 in the image space 41 to coordinates (x′, y′) of the object in the virtual space 46 (see
As illustrated in
As illustrated in
To track the point mass 45, estimation using Kalman filters based on a state-space model can be adopted, for example. Prediction/estimation using Kalman filters improves robustness against false negatives, false positives, and the like of the object 40 to be detected. In general, describing the object image 42 in the image space 41 with an appropriate model describing motion is difficult. Thus, simple and accurate position estimation for the object image 42 in the image space 41 has been difficult. In the object tracking device 20 according to the present disclosure, mapping the object image 42 to the point mass 45 in the real space allows for the application of a model describing motion in the real space, thereby improving tracking accuracy for the object image 42. Handling the object 40 as the point mass 45 of negligible size allows for simple and easy tracking.
Every time the processor 23 estimates a new position of the point mass 45, the processor 23 may map the coordinates of the point mass 45 in the virtual space 46 to coordinates (u, v) in the image space 41 to indicate the estimated position (step S105). The processor 23 can map the point mass 45 located at the coordinates (x′, y′) in the virtual space 46 to the image space 41 as a point located at the coordinates (x, y, 0) in the real space. The processor 23 can use a known method to map the coordinates (x, y, 0) in the real space to the coordinates (u, v) in the image space 41 of the imaging device 10. The processor 23 can convert back and forth between the coordinates (u, v) in the image space 41, the coordinates (x′, y′) in the virtual space 46, and the coordinates (x, y, 0) in the real space.
(Data Association)
In the present embodiment, the processor 23 performs data association between M observed values and N Kalman filters. M is an integer equal to or greater than 2. N is an integer equal to or greater than M. In the example in
KF (2) is used to track the pedestrian 40A up to the time of frame (k−1), but is initialized partway through, and is not used to track the position of the detection target. KF (5) is a Kalman filter that is newly prepared due to the new recognition of the bicycle 40C in frame (k−2). Since the newly recognized bicycle 40C is also recognized in the current frame (k), KF (5) starts tracking the detection target. The other Kalman filters each continue to track a detection target from the time of frame (k−2).
In the example in
(Tracked Object ID Management)
A plurality of Kalman filters may be associated with a single observed value as above, but a plurality of observed values may also be associated with a single object to be detected. As an example, consider the case in which the detection target is the automobile 40B, which briefly disappears from the moving image due to the lane change or the like and then reappears in the moving image. In this case, a new observed value may be associated as a separate object. To track objects accurately, the object tracking device 20 preferably identifies each tracked object to ascertain the associations between tracked objects and observed values. In the present embodiment, the processor 23 executes tracked object ID management using a hierarchical structure as described below to group a plurality of Kalman filters and determine whether the grouped plurality of Kalman filters correspond to the same object.
The processor 23 groups a plurality of Kalman filters upon acquiring a frame of a moving image. The processor 23 updates the associations between the observed values, Kalman filters, and tracked object IDs. In the example in
(Limit on Range of Variation in Index)
As above, a plurality of Kalman filters may be associated with a single observed value, and a plurality of Kalman filters may be associated with a single detection target (detection target having a single tracked object ID). Associating a plurality of Kalman filters can make tracking resistant to failure, raise tracking accuracy, and improve robustness. To ensure consistent tracking, the processor 23 can impose a limit on the range of variation in an index of the Kalman filters that influences tracking of the detection target. As a specific example, limiting the range of variation in an index involves imposing an upper limit and/or a lower limit, and may be referred to as a clipping process. In the present embodiment, the index includes at least one selected from the group consisting of the Mahalanobis distance used to associate Kalman filters with an observed value, the radius of a grouping area used to associate Kalman filters with a detection target, and the size of the error ellipse of a Kalman filter. The following describes limiting the range of variation in each of these indices.
The processor 23 may limit the range of variation in an index in the above data association. In this case, the index includes the Mahalanobis distance. The Mahalanobis distance is a representation of data divergence. In the present embodiment, the Mahalanobis distance represents the divergence between the center of an error ellipse of a Kalman filter and an observed value. The error ellipse of a Kalman filter indicates the estimation range given by a probability density distribution of position, and indicates that the position has a certain probability (as one example, 99%) of being located inside the ellipse. The processor 23 calculates the error ellipse using, among other things, the standard deviation in the x′ direction and the standard deviation in the γ′ direction of the two-dimensional virtual space 46 (see
The lower limit (see the dashed line in
The processor 23 may limit the range of variation in an index in the above tracked object ID management. The association between the same detection target and Kalman filters in tracked object ID management is performed by clustering, such as density-based spatial clustering of applications with noise (DBSCAN), for example. If the centers of the error ellipses for a plurality of Kalman filters are contained in a grouping area of predetermined range, the processor 23 determines that the Kalman filters belong to a single group. The clustering algorithm is not limited to DBSCAN. For example, clustering may be executed according to another algorithm, such as k-means clustering, for example.
When a detection target is distant, the range of the same detection target is susceptible to the influence of observation noise. Observation noise is, for example, image distortion due to the lens included in the imaging optical system 11. When the detection target is distant, the processor 23 sets a large radius (eps) of the grouping area without going over an upper limit, as in
When a detection target is nearby, the range of the same detection target is relatively resistant to the influence of observation noise. When the detection target is nearby, the processor 23 sets a small radius (eps) of the grouping area without going under a lower limit, as in
In this way, the processor 23 imposes an upper limit and/or a lower limit imposed on the radius (eps) of the grouping area according to observation noise and/or the distance to the detection target. By varying the imposed limit(s), the processor 23 can continue tracking with higher accuracy.
The processor 23 may limit the range of variation in an index so that the error ellipse does not become too small relative to an output accuracy (guaranteed accuracy) required for a detection result regarding a detection target. The guaranteed accuracy may be set as a range of allowable error, for example. As described in the limiting of the range of variation in an index in data association above, as the process of tracking a detection target progresses, the tracking accuracy (the certainty of the predicted position) improves, and the error ellipse decreases in size. On the other hand, tracking may not continue if an error ellipse decreases in size and an observed value is no longer contained within the validation gate.
The processor 23 may limit the range of variation in an index so that the size of an error ellipse is appropriate according to the saturability of an observed value that depends on the distance to a detection target. The saturability of an observed value means that a limit exists on improvement in the accuracy of the position of an observed value at close range, and that no loss of accuracy occurs even if the position is farther away.
The processor 23 does not necessarily impose all of the above limits on the range of variation in an index at the same time. That is, the processor 23 may select or combine some of the above limits on the range of variation in an index. The processor 23 may select only the Mahalanobis distance as the index to be limited in the range of variation, for example. The processor 23 may select the radius of the grouping area and the size of an error ellipse according to the saturability of an observed value as indices to be limited in the range of variation, for example. In this case, the processor 23 may impose only a lower limit or only an upper limit on the size of the error ellipse.
As above, with the above configuration, the object tracking device 20 according to the present embodiment allows detection results to overlap in the process of tracking a plurality of detection targets. Thus, the object tracking device 20 can track a plurality of objects accurately, without creating a chain reaction of misassociations. The object tracking device 20 according to the present embodiment also limits the range of variation in an index. Thus, the object tracking device 20 can track an object accurately and consistently.
An embodiment according to the present disclosure has been described on the basis of the drawings and examples, but note that it would be easy for a person skilled in the art to make various variations or revisions on the basis of the present disclosure. Consequently, it should be understood that these variations or revisions are included in the scope of the present disclosure. For example, the functions and the like included in each component, each step, and the like may be rearranged in logically non-contradictory ways, and it is possible to combine a plurality of components, steps, or the like into one or divide a component, step, or the like. Embodiments of the present disclosure have been described mainly in terms of a device, but an embodiment of the present disclosure may also be implemented as a method including steps to be executed by each component of a device. An embodiment of the present disclosure may also be implemented as a method or program to be executed by a processor provided in a device, or as a storage medium on which the program is recorded. It should be understood that these embodiments are also included in the scope of the present disclosure. In one example, the steps of a process to be executed by the processor 23 are included in the object tracking method in
In the above embodiment, the object tracking system 1 includes the imaging device 10, the object tracking device 20, and the display 30, but the object tracking system 1 may include a device integrating at least two of the above devices. For example, the imaging device 10 may include the functions of the object tracking device 20. In this case, the imaging device 10 may include the storage 22 and the output interface 24 in addition to the imaging optical system 11, the image sensor 12, and the processor 13. The processor 13 may execute the processing performed by the processor 23 in the above embodiment on a moving image outputted by the imaging device 10. Such a configuration may be used to achieve an imaging device 10 that executes object tracking.
The “moving bodies” in the present disclosure include vehicles, marine vessels, and aircraft. “Vehicles” in the present disclosure include, but are not limited to, automobiles and industrial vehicles, and may include railway cars, lifestyle vehicles, and fixed-wing aircraft that travel on a runway. Automobiles include, but are not limited to, passenger cars, trucks, buses, motorcycles, and trolleybuses, and may include other vehicles that travel on roads. Industrial vehicles include industrial vehicles for agriculture and construction. Industrial vehicles include, but are not limited to, forklifts and golf carts. Industrial vehicles for agriculture include, but are not limited to, tractors, cultivators, transplanters, binders, combines, and lawn mowers. Industrial vehicles for construction include, but are not limited to, bulldozers, scrapers, excavators, cranes, dump trucks, and road rollers. Vehicles includes human-powered vehicles. The types of vehicles are not limited to the types given above. For example, automobiles may include industrial vehicles that can travel on roads, and the same vehicle may be included in multiple types. Marine vessels in the present disclosure include marine jets, boats, and tankers. Aircraft in the present disclosure include fixed-wing and rotary-wing aircraft.
Number | Date | Country | Kind |
---|---|---|---|
2021-075295 | Apr 2021 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2022/018331 | 4/20/2022 | WO |