This application claims the priority benefit of Taiwan application serial no. 113102073, filed on Jan. 18, 2024. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
The disclosure relates to a monitoring method and system, and in particular, relates to a warning method and a warning system.
Conventionally, in an alarm system that uses a single camera to monitor a space control zone, the main task is to perform image recognition and processing on an image taken by the camera to obtain the object range of people or objects in the image and the warning range of the control zone of the image. Further, the overlapping area between the object range and the warning range is used to determine whether a person or object has entered the control zone, and an alarm is then issued.
However, in this method, since the single-camera screen is a planar projected image, the judgment of the relationship between the object range and the warning range is limited to the projection plane of the image, thus causing misjudgment of the following situations. That is, the people or objects in the image have not entered the warning range in the real three-dimensional space, but due to the camera's viewing angle, the people or objects in the image will still overlap with the warning range on the projected image plane.
The disclosure provides a warning method and a warning system capable of accurately detecting intrusion in a control zone through image recognition.
The disclosure provides a warning method applicable to use a processor to execute the following steps, and the steps include the following. A specified object in an image is detected. A plurality of key points on the specified object are extracted. A plurality of first pixel coordinates of the key points of an image coordinate system are converted into a plurality of first world coordinates of a world coordinate system based on a transformation matrix. A plurality of second pixel coordinates of a control zone of the image of the image coordinate system are converted into a plurality of second world coordinates of the world coordinate system based on the transformation matrix. In response to at least one of the first world coordinates falling into a range formed by the second world coordinates, a warning message is issued.
In an embodiment of the disclosure, the warning method further includes the following steps. A depth map of the image is obtained using a monocular depth estimation algorithm. The depth map includes a plurality of depth values of a plurality of pixels of the image. A plurality of interest points are extracted from the pixels, a plurality of depth values corresponding to the interest points are extracted from the depth map, and a plurality of camera coordinates are calculated based on pixel coordinates of the interest points of the image coordinate system, an internal parameter of an image acquisition device configured to acquire the image, and the depth values of the interest points. The transformation matrix between the image coordinate system and the world coordinate system are calculated based on the plurality of camera coordinates.
In an embodiment of the disclosure, the interest points include a first interest point, a second interest point, and a third interest point. The camera coordinates include first camera coordinates, second camera coordinates, and third camera coordinates. The first camera coordinates, the second camera coordinates, and the third camera coordinates are respectively coordinates of the first interest point, the second interest point, and the third interest point of the image coordinate system. The step of calculating the transformation matrix between the image coordinate system and the world coordinate system further includes the following steps. A first vector is obtained based on the first camera coordinates and the second camera coordinates. A second vector is obtained based on the first camera coordinates and the third camera coordinates. A normal vector is calculated based on the first vector and the second vector. The normal vector is normalized to obtain a normalized normal vector. A rotation matrix is obtained based on the normalized normal vector. The transformation matrix is obtained based on the rotation matrix and a translation matrix.
In an embodiment of the disclosure, the step of extracting the plurality of interest points from the pixels includes the following step. The interest points are selected in a world plane of the image based on user selection.
In an embodiment of the disclosure, the step of detecting the specified object in the image includes the following step. An object detection algorithm is executed to detect the specified object in the image.
In an embodiment of the disclosure, the step of extracting the plurality of key points on the specified object include the following step. A pose estimation algorithm is executed to extract the plurality of key points on the specified object.
The disclosure further provides a warning system including an image acquisition device and a processor. The image acquisition device is configured to acquire an image. The processor is coupled to the image acquisition device and is configured to: detect a specified object in the image, extract a plurality of key points on the specified object, convert a plurality of first pixel coordinates of the key points of an image coordinate system into a plurality of first world coordinates of a world coordinate system based on a transformation matrix, convert a plurality of second pixel coordinates of a control zone of the image of the image coordinate system into a plurality of second world coordinates of the world coordinate system based on the transformation matrix, and issue a warning message in response to at least one of the first world coordinates falling into a range formed by the second world coordinates.
To sum up, in the disclosure, the pixel coordinates of the image are converted into world coordinates to avoid misjudgment caused by the camera's viewing angle, intrusion detection in the control zone is thus accurately performed.
To make the aforementioned more comprehensible, several embodiments accompanied with drawings are described in detail as follows.
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments of the disclosure and, together with the description, serve to explain the principles of the disclosure.
In an embodiment, the processor 110, the storage device 130, and the display 140 may be disposed in a same computer apparatus, and the image acquisition device 120 is connected to the computer apparatus in a wired or wireless manner. In another embodiment, the processor 110, the image acquisition device 120, the storage device 130, and the display 140 may also be disposed in the same computer apparatus together.
The processor 110 is, for example, a central processing unit (CPU), a physics processing unit (PPU), a programmable microprocessor, an embedded control chip, a digital signal processor (DSP), an application specific integrated circuit (ASIC), or other similar devices.
The image acquisition device 120 is, for example, a video camera or a camera using a charge coupled device (CCD) lens or a complementary metal oxide semiconductor transistor (CMOS) lens.
The memory 130 is, for example, a fixed or movable random access memory (RAM) in any form, a read-only memory (ROM), a flash memory, a hard disc or other similar devices, or a combination of the foregoing devices. The storage device 130 includes at least one code snippet, and after being installed, the code snippet is executed by the processor 110. In this embodiment, the storage device 130 includes an object detection module 131, a pose estimation module 132, a depth prediction module 133, and a calculation module 134. Each of the object detection module 131, the pose estimation module 132, the depth prediction module 133, and the calculation module 134 is formed by at least one code snippet. In other embodiments, the object detection module 131, the pose estimation module 132, the depth prediction module 133, and the calculation module 134 may also be implemented by hardware such as physical logic gates and circuits.
The display 140 is, for example, a liquid crystal display (LCD), a plasma display, and the like. In an embodiment, the display 140 may also be integrated with a touch panel to form a touch display.
In an embodiment, the image acquisition device 120 is disposed in a space to be monitored, performs photographing continuously to obtain at least one image 10, and sends the image 10 to the processor 110 for image processing in real time. After that, the processor 110 executes each module in the storage device 130 to determine whether a specified object enters the control zone based on the image 10 and decides whether to issue a warning signal. In addition, the processor 110 may further display the image 10 on the display 140.
Next, in step S210, the processor 110 extracts a plurality of key points on the specified objects. The processor 110 executes a pose estimation algorithm through the pose estimation module 132 to extract the key points on the specified objects. For instance, if the specified objects are human bodies, the key points are human skeleton points. A pose estimation model may adopt the PoseNet algorithm, the MoveNet algorithm, or the BlazePose algorithm.
Herein, the processor 110 first calculates the transformation matrix between the image coordinate system and the world coordinate system to convert the pixel coordinates of the image into the world coordinates of the real space. In an embodiment, the processor 110 may execute a monocular depth estimation algorithm through the depth prediction module 133 to evaluate a depth value of each pixel of the image 10, and a corresponding depth map is then obtained. That is, the depth map includes the depth values of all pixels of the image 10. In other embodiments, before there is any object in the real space, the image acquisition device 120 may be used to acquire an image (before any object enters) and calculate a depth map of the image. After that, the processor 110 obtains the transformation matrix through the calculation
module 134. Generally, the coordinates of the world coordinate system (world coordinates) are converted into coordinates of a camera coordinate system (camera coordinates) of the image acquisition device 120 through external parameters of the image acquisition device 120. That is, the world coordinates are converted to the camera coordinates after translation and rotation. The external parameters of the image acquisition device 120 include a rotation matrix and a translation matrix. The camera coordinates of the image acquisition device 120 are then converted into pixel coordinates of a captured image (i.e., coordinates of the image coordinate system) through the projection of internal parameters. The internal parameters of the image acquisition device 120 include a focal length f and coordinates of a principal point, and the principal point is, for example, a point where a photosensitive element intersects an optical axis. Herein, a center point of the image 10 is treated as the principal point for the convenience of calculation. Accordingly, the processor 110 may use the captured image, the internal parameters and external parameters of the image acquisition device 120, and a distance estimated by depth prediction to calculate the transformation matrix for coordinate conversion between the world coordinate system and the image coordinate system.
In an embodiment, the processor 110 may select a plurality of interest points (at least three points) on a world plane (an XY plane of the world coordinate system, e.g., a real floor zone) in any image (an image with a specified object or an image without a specified object) acquired by the image acquisition device 120 to calculate the transformation matrix. For instance, the processor 110 may select the interest points of the floor zone of the image based on user selection. Alternatively, the processor 110 may determine the floor zone according to the depth map and then extract the interest points of the floor zone.
For instance, the processor 110 extracts a plurality of interest points from a plurality of pixels of any image obtained by the image acquisition device 120, extracts a plurality of depth values corresponding to the interest points from the depth map, and calculates a plurality of camera coordinates based on pixel coordinates of the interest points, an internal parameter of the image acquisition device 120, and the depth values of the interest points. After that, the processor 110 calculates the transformation matrix between the image coordinate system and the world coordinate system based on the plurality of camera coordinates.
It is assumed that pixel coordinates of the interest point p1 are (u1, v1), pixel coordinates of the interest point p2 are (u2, v2), pixel coordinates of the interest point p3 are (u3, v3), the focal length of the image acquisition device 120 is f, and principle point coordinates are (ucenter, vcenter). After the interest points p1 to p3 are determined, depth values d1, d2, and d3 corresponding to the interest points p1 to p3 are extracted from the depth map. Next, camera coordinates camera_1, camera_2, and camera_3 corresponding to the interest points p1 to p3 are calculated.
Next, a first vector vector1 is obtained based on the camera coordinates camera_2 and the camera coordinates camera_1, and a second vector vector2 is obtained based on the camera coordinates camera_3 and the camera coordinates camera_1. A normal vector vectornormal is obtained based on an external product of the first vector vector1 and the second vector vector2, and the normal vector vectornormal is normalized to obtain a normalized normal vector n_normalized. To normalize a vector is to make it unitary or to convert it to a scalar quantity. Normalization is a basic operation in vector operations. Please refer to the following formula.
The following formula is then used, that is, the rotation matrix R is rotated to the Z vector (0,0,1) of the world coordinate system through the normalized normal vector n_normalized, and the 3×3 rotation matrix R is obtained:
The rotation matrix R is then used to obtain the transformation matrix M:
where T is a 3×1 translation matrix. In an embodiment, the translation matrix T is, for example, [0, 0, 0].
After the transformation matrix M is obtained, the coordinates of the key points k1 to k15 of
After that, in step S225, in response to at least one of the first world coordinates falling into a range formed by the second world coordinates, the plurality of 110 issues a warning message. For instance, the processor 110 transmits the warning message to an audio output device (e.g., a speaker, a stereo, an amplifier, etc.), so that the audio output device outputs an audio signal to remind the user. Alternatively, the processor 110 transmits the warning message to the display 140, so that the display 140 displays a text message, a static image message, or a dynamic image message for reminding the user. Alternatively, the processor 110 transmits the warning message to a light-emitting device, so that the light-emitting device emits light to achieve the warning effect.
In view of the foregoing, in the disclosure, a monocular depth estimation algorithm is used to obtain a depth map corresponding to an image. The control zone of the planar projected image and the key points in the specified object are then projected back to the world coordinate system based on the depth map. Intrusion detection in the control zone may be performed accurately through the use of the world coordinates.
It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed embodiments without departing from the scope or spirit of the disclosure. In view of the foregoing, it is intended that the disclosure covers modifications and variations provided that they fall within the scope of the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
113102073 | Jan 2024 | TW | national |