This application claims the priority benefit of Taiwan application serial no. 108145015, filed on Dec. 10, 2019. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
The disclosure relates to a method and an apparatus for image processing, and more particularly to a method and an apparatus for object recognition.
In many fields, there are tasks that require manual monitoring, such as facial recognition performed at self-service immigration control facilities at the airport immigration, waste sorting at resource recycling sites, and recognizing pedestrians and vehicles by using monitors installed by police stations at intersections to check for abnormalities, and the like. Some application fields rely on real-time response results. For example, in the fields such as self-driving cars and self-driving ships, real-time recognition results are required. If a recognition time is shorter, a delay is shorter, and more information is recognized, whereby information for decision-making is more sufficient.
However, high-end photographic equipment today can shoot 120 to 240 frames per second (FPS). To make better use of information captured by a camera, it is important to accelerate a recognition speed in a model.
An embodiment of the disclosure provides a method for object recognition, applicable to an electronic apparatus that includes a processor. The method includes: receiving a video including a plurality of frames, and separating the frames into a plurality of frame groups; executing object recognition on a specific frame in each of the frame groups to recognize at least one object in the specific frame; dividing a bounded area of each object into a plurality of sub-blocks, and sampling at least one feature point within at least one of the sub-blocks; and tracking each object in the frames in the frame group according to a variation of the feature point in the frames in the frame group.
An embodiment of the disclosure provides an apparatus for object recognition, including an input/output apparatus, a storage apparatus and a processor. The input/output apparatus is coupled to an image source apparatus and configured to receive a video including a plurality of frames from the image source apparatus. The storage apparatus is configured to store the video received by the input/output apparatus. The processor is coupled to the input/output apparatus and the storage apparatus, and configured to separate the frames in the video into a plurality of frame groups, execute object recognition on a specific frame in each of the frame groups to recognize at least one object in the specific frame, divide a bounded area of each object into a plurality of sub-blocks, and sample at least one feature point within at least one of the sub-blocks, and track the object in the frames in the frame group according to a variation of the feature point in the frames in the frame group.
Several exemplary embodiments accompanied with figures are described in detail below to further describe the disclosure in details.
The accompanying drawings are included to provide further understanding, and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments and, together with the description, serve to explain the principles of the disclosure.
According to characteristics that objects in continuous images move little in a short period of time and have similar features and that most images applied in actual fields are highly continuous, embodiments of the disclosure increase recognition speed by using an object recognition and optical flow method in view of similarity of continuous images. An object recognition model in an embodiment of the disclosure is a deep learning object recognition model, and a large number of images are input into a training model as training data to learn and determine categories and positions of objects in each of the images.
In an embodiment of the disclosure, for example, a sparse optical flow method is used together with an object recognition model. According to variations of pixels of continuous frames, movement speed and direction of an object are inferred, and acceleration is accomplished. The sparse optical flow method needs only to track a small number of feature points in the image. Therefore, required computing resources are far less than those required in conventional object recognition. In an embodiment of the disclosure, high-accuracy detection provided by an object recognition technology works together with a small computing load and high-speed prediction available from the sparse optical flow method to keep recognition accuracy and improve object recognition speed.
The input/output device 12 is, for example, a wired or wireless communication interface such as a universal serial bus (USB), an RS232, a Bluetooth (BT), or a wireless fidelity (Wi-Fi) interface, and is used to receive videos provided by image source devices such as cameras and camcorders. In an embodiment, the input/output device 12 may also include a network adapter that supports Ethernet or a wireless network standard such as 802.11g, 802.11n, and 802.11ac. In this way, the apparatus 10 for object recognition can be coupled to a network and receive videos through a remote device such as a network camera or a cloud server.
In an embodiment, the apparatus 10 for object recognition may include one of the image source devices, or may be built in the image source device. The input/output device 12 is a bus disposed inside the apparatus for transmitting data, and can transmit a video to a processor 16 for processing, where the video is shot by the image source device. This embodiment is not limited to the foregoing architecture.
The storage device 14 is, for example, any type of fixed or removable random access memory (RAM), read-only memory (ROM), flash memory, hard disk, or a similar component or a combination thereof, and is used to store a program executable by the processor 16. In an embodiment, the storage device 14 further stores, for example, a video received by the input/output device 12 from the image source device.
The processor 16 is coupled to the input/output device 12 and the storage device 14, and may be, for example, a central processing unit (CPU), or another programmable general-purpose or special-purpose microprocessor, a digital signal processor (DSP), a programmable controller, an application-specific integrated circuit (ASIC), a programmable logic controller (PLC) or another similar device or a combination thereof, and can load and execute the program stored in the storage device 14 to execute the method for object recognition in the embodiment of the disclosure.
First, in step S202, the processor 16 uses the input/output device 12 to receive a video including a plurality of frames from an image source device, and divides the received frames into a plurality of frame groups. The number of frames included in each frame group is, for example, dynamically determined by the processor 16 according to characteristics of a shooting scene, object recognition requirements, or computing resources of the apparatus, and is not limited to a fixed number of frames.
In step S204, the processor 16 executes object recognition on a specific frame in each of the frame groups to recognize at least one object in the specific frame. In an embodiment, the processor 16 may, for example, execute an object recognition algorithm on a first frame in each of the frame groups to recognize an object in the first frame. The processor 16 may, for example, use a pre-created object recognition model to find features in the frame and recognize the object. The object recognition model is, for example, a model created by using a convolutional neural network (CNN), a deep learning algorithm, or another type of artificial intelligence (AI) algorithm, and learns a large number of input images to recognize or distinguish different features in the image.
For example,
Referring back to the flow in
In an embodiment, the processor 16 may, for example, divide the bounded area of each object into a plurality of equal sub-blocks (for example, nine rectangle sub-blocks), and select a sub-block for sampling feature points, where the sub-block is a sub-block that covers a largest area of the object among the sub-blocks (such as a central sub-block located at a center). In an embodiment, the method for dividing a bounded area and/or the number of sub-blocks are determined according to the characteristics of the object. For example, a stripe-shaped bounded area is divided into three equal or non-equal sub-blocks. In an embodiment, a sub-block in which the feature points need to be sampled is determined according to the characteristics of the object. For example, if the object is a donut, the feature points may be sampled in another sub-block other than a central sub-block of the nine rectangle sub-blocks.
For example,
In step S208, the processor 16 tracks the object in the frames in the frame group according to a variation of the feature points in the frames in the frame group. Specifically, for example, the processor 16 randomly samples a plurality of optical flow tracking points in the sub-block selected in step S206, uses the optical flow tracking points as feature points, and uses a sparse optical flow method to track variations of the optical flow tracking points in subsequent frames, and to track objects within the frame. The sparse optical flow method may be, for example but without limitation, a Lucas-Kanade optical flow method.
According to the method described above, this embodiment uses an object recognition technology to select a target object, tracks the feature points of continuous images, calculates the variation of the selected object between the continuous images, thereby keeping recognition accuracy and improving object recognition speed.
It should be noted that, in other embodiments, the processor 16 may, for example, according to the average displacement of the optical flow tracking points in the frame and the change of intervals between the tracking points, change the sub-block used to track the object, or change the position or size of the bounded area of the object, which is not limited herein.
In an embodiment, the processor 16 may, for example, calculate an average displacement of each feature point within the sub-block, select a neighboring sub-block in the average displacement to replace the current sub-block, and re-sample at least one feature point within the neighboring sub-block for tracking. The average displacement is, for example, an average of distances of all the feature points in all directions, and may represent a movement trend of the object. In this embodiment, by diverting the tracked block to the movement direction of the object, subsequent changes in the position of the object can be accurately tracked.
In an embodiment, the processor 16 may, for example, calculate the average displacement of each feature point within the sub-block, and change the position of the bounded area of the object according to the calculated average displacement. In this embodiment, by moving the position of the bounded area of the tracked object toward the calculated average displacement and sampling and tracking feature points again in the moved bounded area, subsequent position change of the object can be tracked accurately.
In an embodiment, for example, the processor 16 calculates the change of interval between the feature points, and changes the size of the bounded area of the object according to a difference in the calculated change of interval. Specifically, when the size of the object in the frame changes (increases or decreases) due to moving (closer or farther away), the change of interval between corresponding feature points on the object also changes, and the change of interval change is somehow in proportion to the size change of the object. Therefore, in this embodiment, by appropriately enlarging or reducing the size of the bounded area of the tracked object according to the difference in the calculated change of interval and sampling and tracking feature points again in the enlarged or reduced bounded area, subsequent position change of the object can be tracked accurately.
For example,
In an embodiment, when a plurality of objects exist in the frame, the objects may overlap. The object overlap may affect accuracy of object recognition and tracking. In this regard, in the foregoing embodiments of the disclosure, each object in the frame has been recognized to generate a bounded area, and feature points for tracking the object are generated in the bounded area. Therefore, in an embodiment, the feature points may be combined with the bounded area to avoid the foregoing impact caused by object overlap.
Specifically, in an embodiment, the apparatus for object recognition may, for example, determine whether a bounded area of an object in the frame overlaps another. When it is determined that a bounded area overlaps, the apparatus for object recognition uses the feature points originally sampled in the sub-block to which each object belongs, and excludes the feature points sampled in the sub-block to which other objects belong (that is, other feature points are not included in the calculation) to track each object. For example, when a first object and a second object are recognized in a specific frame, the apparatus for object recognition determines whether the bounded area of the first object overlaps the bounded area of the second object. When the bounded area of the first object overlaps the bounded area of the second object, the apparatus for object recognition uses the feature points sampled in the first object and excludes the feature points sampled in the second object to track the first object.
For example,
The method and apparatus for object recognition according to an embodiment of the disclosure divides frames of a video into a plurality of groups, performs object recognition on only at least one frame in each group, and randomly generates sparse optical flow tracking points in the bounded area of the recognized object; for the remaining frames in the group, adjusts the position and size of the bounded area of the object according to the variation of the sparse optical flow tracking points. In this way, object tracking is performed and the effect of accelerating object recognition is achieved.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the disclosed embodiments without departing from the scope or spirit of the disclosure. In view of the foregoing, it is intended that the disclosure cover modifications and variations of this disclosure provided they fall within the scope of the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
108145015 | Dec 2019 | TW | national |