The present invention provides an object automatic tracking system and an identification method, which may be executed in the small-sized edge computing device.
Nowadays, object identification technology is widely applied in different fields. Thereinafter, compare with general robots, mobile robots such as guide robots, dish delivery robots, and dish-receiving robots which are suitable for convenience stores, hotels, and restaurants are needed to have the ability to identify dynamic obstacles in real-time.
However, based on the consideration of cost and applicable environment, most of mobile robots cannot actually carry a computing device with high computing ability.
Therefore, an object automatic identification system and an identification method thereof capable of performing high-precision computing on a small-sized edge computing device are indeed the inventions expected by present industry.
As the above description, the present invention relates to an automatic object tracking system and an identification method thereof.
An object automatic tracking system according to an embodiment of the present invention includes an image capturing device, a computing device and a display device, and the computing device includes a first computing module and a second computing module. The image capturing device is connected to the computing device for acquiring and transmitting an image to the computing device for processing. Further, the computing device is connected to the display device to display the final processing result on the display device.
In some embodiments, the above-mentioned first computing module includes: a first portion, a second portion, and a detecting structure. Furthermore, the first portion includes a plurality of convolution sets and a plurality of residual blocks, which are used for performing feature extraction on the inputted first data, and to output a plurality of initial feature maps correspondingly. The second portion is connected to the first portion, and the second portion is used for concatenating the initial feature maps from the first portion, and correspondingly outputting at least one feature map. The detecting structure is connected to the second portion for detecting the feature maps outputted from the second portion, and generating a property information and a location information of each target object appearing in the origin input image.
An identification method of an automatic object tracking system according to an embodiment of the present invention includes the following steps: At the beginning an image is captured by the automatic object tracking system described above. The image is converted into a frame data by using an MPEG encoding format, and determined as either the first data or the second data according to the type of each frame in the frame data. On the other hand, the above-mentioned first computing module is used for performing operation on the first data to obtain a property information and a location information of each target object in the image. At the same time, the above-mentioned second computing module is used for processing the second data to obtain the trajectory information of each target object. Finally, the property information, the location information and the trajectory information are combined and output to the above-mentioned display device.
The above-mentioned descriptions are only preferred embodiments of the present invention and are not intended to limit the scope of implementation of the present invention. Therefore, all the shapes, structures, features, and spirits described in the scope of the patent application of the present invention shall be regarded as equivalent to the changes and modifications per se, and be included in the scope of the patent application of the present invention.
The present invention relates to an object automatic tracking system, which may be executed in the small-sized edge computing device.
Thereinafter, to make the description of the present disclosure more detailed and complete, the following description provides an illustrative description for the implementation and specific embodiments of the present invention. However, the following description is not the only form of implementing or using specific embodiments of the invention. In these paragraphs, the features of various specific embodiments are covered as well as the method steps and sequences for constructing and operating these specific embodiments. However, the other embodiments may also be utilized to achieve the same or equivalent function and sequence of steps.
In the embodiment, the object automatic tracking system 1 encoded the original capturing images to a frame data, and determines a first data or a second data which are calculated by the first computing module 200A and the second computing module 200B via the frame data type respectively. It significantly reduces the processing computations and executes the processing at a minimum speed of 30 fps (frames per second) in small-sized edge computing device, in the embodiment the small-sized edge computing device is AI edge computing platform such as NVDIA Jetson Nano™, Jetson Xavier NX™, etc.
Moreover, a group of frame data is a video frame by MPEG encoding format, a group of frame data includes at least one I frame (Intra frame) as the first data, and at least one P frame (Predicted frame) as the second data. The computing device 20 determined the type of the frame data. If the frame data is determined to be I frame, the frame data will be read and transferred to the first computing module 200A. Otherwise, if the frame data is detected as a P frame, the frame data will be read and transferred to the second computing module 200B. Furthermore, the frame data is a video frame that is encoded with GOP (Group of Picture) structure, the first data is a collection of I frame in GOP structure, and the second data is a collection of P frame in GOP structure.
As shown in
Specifically, in the first computing module 200A, the function of the first portion 200A1 is to perform feature extraction on the target object in the first data. The process of the second portion 200A2 concatenates local features between feature maps of different sizes.
In the embodiment, the first portion 200A1 includes a plurality of convolution sets 2201 and a plurality of residual blocks 2202. As shown in
In the embodiment, the convolutions included in each residual block 2202 of the first computing module 200A are connected to each other. In addition, the overall computation of the neural network is directly correlated to the number of convolutional layers included in the densely connected convolution sets 2201 in each residual block 2202 and the number of filters used in each residual block 2202, and inversely correlated to the number of max pooling or convolution stride.
Based on the above reasons, the user can reduce the overall neural network complexity by increasing the amount of the max pooling or increasing the convolution stride of the first computing module. That can improve the execution speed of the first computing module 200A on a small-sized edge computing device. At the same time, on the other hand, increasing the number of residual blocks 2202 used, or increasing the filter types to increase the number of neurons in the network can improve the detection accuracy (for example, the user can set the amount of the residual blocks in the first portion 200A1 to 1, 15, 15, 8, and can set the filter type to 32, 64, 128, 256, and 512). Thus, it is ensured that the edge computing device can maintain a detection accuracy above a certain level on the basis of high execution speed.
Furthermore, in the embodiment, the network complexity can be future reduced to achieve the effect of speeding up network convergence by setting the convolution of second portion 200A2 to the spatial separable convolution.
Please further refer to
The second computing module 200B adopted one of the above target tracking algorithms and predicted the trajectory of each target object in the second data, and obtained a trajectory information corresponding to the target object.
Specifically, as shown in
In this way, the first computing module 200A of the present embodiment can dramatically decrease the computing amount by inserting more max pooling and changing the stride of the convolutional which connected to the first residual block 2202-1 in the convolution sets 2201 to a larger size such as two. It is also possible to further increase the number of layers of the first computing module 200A to increase the amount of parameter for each convolution process, thereby achieving high detection accuracy while maintaining high execution speed (for example: AP is 90.58% based on VOC2007 test).
In the beginning, in step 2, an original image is obtained from image capturing device 10 and transmitted to the computing device 20, and then the computing device converts the original image into a group of frame data using a MPEG encoding format, and determines the image as either the first data or the second data according to the type of each frame in the frame data. In the embodiment, the above MPEG encoding format is based on group of picture (GOP), at the same time, the first data is an I frame in the frame data, and the second data is a P frame in the frame data.
In the next step S3, the computing device 20 performs computation for the first data by the first computing module 200A, thus to obtain property information and location information corresponding to each target object in the origin input image; at the same time, the second computing module 200B is used for processing the second data, thus to obtain the trajectory information corresponding to each target object in the origin input image.
Finally, in step S4, the computing device 20 combines and outputs the obtained category information, position information, and trajectory information to the display device 30 so as to be reflected on the original image. In this embodiment, the merging may be implemented by executing the Non-Maximum Suppression (NMS) algorithm, Soft-NMS algorithm or similar algorithms in the prior art, and details are not described herein again.
The above-mentioned descriptions are only preferred embodiments of the present invention and are not intended to limit the scope of implementation of the present invention. Therefore, all the shapes, structures, features, and spirits described in the scope of the patent application of the present invention shall be regarded as equivalent to the changes and modifications per se, and be included in the scope of the patent application of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
111118455 | May 2022 | TW | national |