The present invention generally relates to a technology of effectively performing video analysis of a compressed video (e.g., H.264 AVC, H.265 HEVC) in a video analysis system.
More specifically, the present invention relates to a video analysis technology of a compressed video (e.g., CCTV video) in which branching in the video analysis process is performed with referring to motion vectors which have been obtained by parsing the compressed video, so that computing resource for the video analysis of the compressed video may be lowered.
In recent years, it is common to establish a CCTV-based video surveillance system for the purpose of crime prevention as well as proof of criminal evidence. However, the number of staff members is not enough for monitoring the CCTV cameras. In order to effectively perform video surveillance with such a limited number of staffs, it would be helpful to detect meaningful objects in CCTV video by video analysis and further display something in the corresponding region of the CCTV video on monitor screens.
The CCTV cameras usually have high definition (e.g., Full HD) and high bit-rate (e.g., 24 frames-per-second). With considering network bandwidth and storage space, high compression video technology, such as H.264 AVC, H.265 HEVC, etc., are adopted in the CCTV video. The CCTV cameras shall produce and provide video data in a form of compressed video by any one of the technical standards as above. Then, video analysis system shall receive the compressed video, perform decoding by the technical standard which has been used in encoding the compressed video, and then extracts information out of the CCTV video by video analysis.
In conventional solutions, video analysis of a compressed video includes decoding, downscale resizing, and image analysis. These are very complicated processing, which limits the capacity of video analysis server in conventional video surveillance systems. Currently, the maximum number of CCTV channels which a high-performance video analysis server can deal with is sixteen (16) in general. Because pluralities of CCTV cameras are being installed, video surveillance system requires pluralities of video analysis servers, which causes many problems such as cost increase and difficulty in physical space.
It is an object of the present invention to provide a technology of effectively performing video analysis of a compressed video (e.g., H.264 AVC, H.265 HEVC) in a video analysis system.
In particular, it is another object of the present invention to provide a video analysis technology of a compressed video (e.g., CCTV video) in which branching in the video analysis process is performed with referring to motion vectors which have been obtained by parsing the compressed video, so that computing resource for the video analysis of the compressed video may be lowered.
In order to achieve the objects as above, the present invention discloses a video analysis method by which a video analysis server may analyze a compressed video by use of branching by motion vector.
The video analysis method of a compressed video by use of branching by motion vector of the present invention comprises: identifying a series of image frames out of the compressed video; sequentially selecting a target frame out of the series of image frames according to a predetermined selection rule; checking an image analysis flag; if the image analysis flag is OFF, determining detection of a region of moving object in the target frame based on a motion vector accumulation which is obtained by bit-stream parsing of the compressed video; if the image analysis flag is OFF and a region of moving object fails to be detected in the determining detection of a region of moving object, skipping image analysis for the selected target frame and proceeding to the selecting target frames; if the image analysis flag is ON or a region of moving object is detected in the determining detection of a region of moving object, determining detection of an effective object in the selected target frame by image analysis on the selected target frame; if an effective object fails to be detected in the determining detection of an effective object, setting the image analysis flag to OFF and proceeding to the selecting target frames; and if an effective object is detected in the determining detection of an effective object, setting the image analysis flag to ON and proceeding to the selecting target frames.
The video analysis method of the present invention may further comprise: if the selected target frame is a P frame which is located within a predetermined number at the rear end of the GOP, skipping image analysis for the selected target frame and proceeding to the selecting target frames.
In the present invention, the determining detection of a region of moving object may comprise: parsing the bit-stream of the compressed video so as to obtain motion vector and coding type for coding unit; obtaining motion vector accumulation for a predetermined time-period for each of a plurality of image blocks which constituting image frames of the compressed video; identifying image blocks whose motion vector accumulation are higher than a predetermined first threshold; and marking the identified image blocks as region of moving object.
In the present invention, the determining detection of a region of moving object may further comprise: identifying a plurality of image blocks (hereinafter referred to as ‘neighboring blocks’) around the image blocks which are marked as region of moving object; marking as region of moving object some of the neighboring blocks whose coding type is Intra Picture; identifying neighboring blocks whose motion vector are higher than a predetermined second threshold; further marking the identified neighboring blocks as region of moving object; performing interpolation on the image frames of the compressed video, by which unmarked image blocks which are surrounded by a plurality of marked image blocks are marked as region of moving object, wherein the marked image blocks are image blocks which are marked as region of moving object, wherein the unmarked image blocks are image blocks which are not marked as region of moving object yet, and wherein the number of unmarked image blocks is less than a predetermined threshold value; and setting a lump of marked image blocks as a region of moving object, wherein the lump is formed by clumping up the marked image blocks.
The computer program according to the present invention is stored in a medium in order to execute the video analysis method of a compressed video by use of branching by motion vector which has been set forth above by being combined with hardware.
Hereinafter, the present invention shall be described in detail as below with referring to the accompanying drawings.
Step (S1000): The video analysis server processes the compressed video so as to identify a series of image frames out of the compressed video. According to the video compression technical standards, a compressed video may comprise a series of image frames as shown in
Step (S1100): Then, the video analysis server sequentially selects target frames for image analysis out of the series of image frames according to a predetermined selection rule. The series of image frames may be selected one by one for image analysis. Alternatively, image frames may be selected intermittently at around 3-4 frames-per-second for image analysis.
Step (S1200): The video analysis server checks whether an image analysis flag is ON. The image analysis flag shall be set to ON or OFF in (S1800) and (S1900) in this specification. The initial value of the image analysis flag is presumed as OFF for convenience.
Steps (S1300, S1400): If image analysis flag is OFF, the video analysis server checks whether a region of moving object is detected in the target frame based on a motion vector accumulation for a predetermined time-period which is obtained by bit-stream parsing of the compressed video. The procedure (S1300) of checking whether a region of moving object is detected in the target frame based on a motion vector accumulation shall be described below with referring to
If a region of moving object fails to be detected in (S1300), the video analysis server skips image analysis for the target frame and proceeds to (S1100) with presuming that the target frame does not include any substantial information. In this case, for that target frame, only bit-stream parsing and some arithmetic operations need to be done without performing complicated processing, e.g., decoding, downscale resizing, differential image obtaining, and playback image analysis. That means that computing resource may be lowered.
Steps (S1500˜S1900) : If image analysis flag is OFF in (S1200), the video analysis server proceeds directly to (S1500) without going through (S1300) and (S1400). Further, if a region of moving object is detected in (S1300), the video analysis server proceeds to (S1500) with considering that the target frame may include substantial information.
In (S1500), the image analysis is performed on the target frame in order to check an effective object. For example, object can be checked in the target frame by decoding, downscale resizing, differential image obtaining, and playback image analysis, as described in
In an embodiment, the image analysis of (S1500) is performed on the entire image frames, not only on the regions of moving object (e.g., blue regions in
Then, in (S1600), it is checked whether an effective object is detected by the image analysis of (S1500). If an effective object is detected in (S1500), the image analysis flag is set to ON in (S1800), and then the process proceeds to (S1100). That is, when an effective object is detected in (S1500), by setting the image analysis flag to ON, the image analysis of (S1500) shall be performed on a series of target images until the effective object disappears from the playback video. If the effective object disappears from the playback video, the image analysis of (S1500) fails to detect the effective object. In this case, the image analysis flag is set to OFF in (S1900), and then the process proceeds to (S1100).
On the other hand, if the target frame is a P frame which is located within a predetermined number at the rear end of the GOP, the image analysis for the target frame is skipped and the process proceeds to (S1100).
In general, considering computing resource, the video analysis server does not perform image analysis on all of the frames, but only on about 3 or 4 frames-per-second. For example, it is assumed that the compressed video has 30 frames-per-second and the video analysis server performs image analysis once every 10 frames. In a GOP, the video analysis server performs image analysis on the leading I frame, and then performs image analysis on every 10 P frames. Then, in a GOP, there is no need to analyze the P frames which are located within 9 of the rear end of the GOP, i.e., the 21st to 29th P frames. This decoding skipping may let the computing resources lowered. Meanwhile, in some embodiments, the B frame may be mixed with the P frames in
In the present invention, the regions which appear to be moving object (i.e., regions of moving object) may be extracted from a compressed video with referring to syntax information. That is, while being unaware of the contents or story of the compressed video, lumps of images which are presumed to include moving object may be extracted with referring to syntax information. For this purpose, without necessity of decoding compressed video, regions of moving object may be extracted quickly by use of the syntax information of each of image blocks which are obtained by bit-stream parsing of the compressed video. The image blocks may be any one or a combination of macro blocks or sub-blocks, etc. The syntax information may be preferably motion vector and coding type. The regions of moving object which are thus obtained may fail to accurately reflect the boundary line of the moving objects. However, the processing speed is fast and the reliability is high, as confirmed in several images attached to this specification.
Step (S100): First, effective movements to which substantial meaning may be given are detected in the compressed video based on motion vector of the compressed video. Then, the image regions in which the effective movements are detected are set as regions of moving object.
For this purpose, motion vector and coding type is parsed for coding units of the compressed video according to video compression technical standard such as H.264 AVC or H.265 HEVC, etc. The size of the coding unit is usually more or less 64×64 pixel 4×4 or 4×4 pixel. However, the size may be flexibly configured. For each of image blocks, motion vectors are accumulated for a predetermined time-period of a plurality of frames (e.g., 500 msec). For example, if the compressed video has 30 frames-per-second, motion vectors are accumulated for each of image blocks for 15 frames (i.e., 500 msec). Then, the motion vector accumulation is checked whether it is higher than a predetermined first threshold (e.g., 20). When an image block is found which has the motion vector accumulation higher than a predetermined first threshold, it is regarded that effective movement is found in the image block, and accordingly the found image block is marked as region of moving object. On the other hand, even if motion vectors are obtained in an image block, if its motion vector accumulation for a specific time-period fails to be higher than the first threshold, it is regarded that corresponding change in video is rather small, and accordingly the image block shall not be marked as region of moving object.
Because the present invention only checks whether regions of moving object are detected in an image frame of a compressed video, (S1300) can be accomplished by only (S100). However, in order to detect regions of moving object more correctly, it is preferable to further perform boundary check (S200) and interpolation (S300), which shall be described below.
Step (S200): Then, for the regions of moving object which have been detected in (S100), the extent of boundary area is detected by use of motion vector and coding type. Through this procedure, by clumping up the regions of moving object which were marked in a fragmented pattern in (S100), a meaningful lump of image blocks shall become formed.
In (S100), by applying a strict criteria, image blocks which surely correspond to moving objects are selected in the compressed video and marked as region of moving object. In (S200), other image blocks shall be examined, which are positioned around the image blocks which were marked as region of moving object in (S100). These are referred to herein as ‘neighbor blocks’ for convenience. These neighboring blocks shall be checked whether they are regions of moving object by looser criteria than in (S100).
In a compressed video, macro blocks and sub-blocks are very small in size. Therefore, if the compressed video is an image of persons, cars or animals, such as a CCTV video, due to its characteristics, it is difficult for a moving object to appear only in one or a few image blocks. Rather, it is expected that the moving object shall appear across several image blocks. That is, it is expected that the probability of including moving objects is higher for image blocks which are positioned around the image blocks which include moving objects than for other image blocks. Reflecting the expectation, in (S200), for the neighboring blocks which are positioned around the regions of moving object, a relatively looser criteria is applied in checking whether the image blocks shall be marked as region of moving object.
In one embodiment, each of the neighboring blocks is examined and is marked as region of moving object when its motion vector is higher than a predetermined second threshold (e.g., 0) or when its coding type is Intra Picture. In other embodiment, each of the neighboring blocks is examined and is marked as region of moving object when the motion vector accumulation of (S100) is higher than a predetermined third threshold (e.g., 5) or when its coding type is Intra Picture. It is logical that the second threshold and the third threshold shall be smaller values than the first threshold.
Conceptually, if a image block having some movement is found in the vicinity of a region of moving object in which substantially effective movements have already been confirmed, it is highly probable that the image block would form a single lump with the region of moving object. That is why the image block is also marked as region of moving object. Further, because the motion vector is unavailable for Intra Picture, it is impractical to examine the neighboring blocks of Intra Picture based on motion vector. In this case, the neighboring blocks of Intra Picture may be marked as region of moving object, so as to let the neighboring blocks of Intra Picture to form a single lump with the adjacent image blocks which have already been marked as region of moving object. The loss when one image block is wrongly marked as region of moving object is small, whereas the loss when the region of moving object is fragmented is big.
Step (S300): The interpolation is performed on the regions of moving object which were detected in (S100) and (S200) so as to fix up fragmentation in the regions of moving object. In the previous procedure, regions of moving object were marked in a unit of an image block. Accordingly, although it is actually a single moving object (e.g., a human, a car), due to some unmarked image blocks which are sparsely mixed between regions of moving object, the single moving object may be fragmented into a plurality of regions of moving object. These unmarked image blocks may let the plurality of regions of moving object be handled as separate moving objects. This fragmentation shall let the detection of moving objects be inaccurate. Therefore, if one or small number of unmarked image blocks are found with being surrounded by a plurality of marked image blocks, they are also marked as region of moving object. In this specification, the procedure is referred to as ‘interpolation’. Further, in this specification, ‘marked image block’ means an image block which has been already marked as region of moving object, and ‘unmarked image block’ means an image blocks which is not marked as region of moving object yet. Through this procedure, a plurality of regions of moving object can be clumped up so as to form a single lump as shown in
Comparing
Step (S400): One or more regions of moving object were obtained from a compressed video through (S100) to (S300). In each of (S100) to (S300), it was checked whether each of image blocks belongs to region of moving object and then marked. However, in the end, the image blocks which were marked as region of moving object shall be clumped each other so as to form a lump of image blocks. The lump of image blocks, each of which was marked as region of moving object, shall be treated as a region of moving object. As shown in
Step (S110): Firstly, the bit-stream of the compressed video is parsed so as to obtain motion vector and coding type for coding unit. Referring to
Step (S120): The motion vector accumulation for a predetermined time-period (e.g., 500 ms) is obtained for each of a plurality of image blocks which constitutes the compressed video. This step is proposed in order to detect any substantially meaningful movement (i.e., effective movement) in the compressed video, e.g., cars in driving, running peoples, and crowds fighting each other. The objects of substantially meaningless movement may not be detected, e.g., shaking leaves, temporal ghosts, and shadows that change slightly by the reflection of light. For this purpose, motion vectors are accumulated for a predetermined time-period (e.g., 500 msec) for unit of image block. The term of ‘image block’ may include macro blocks and sub-blocks in this specification.
Steps (S130, S140): For the plurality of image blocks, the motion vector accumulation is compared with a predetermined first threshold (e.g., 20). Then, image blocks whose the motion vector accumulation is higher than the first threshold are marked as regions of moving object. That is, when an image block having such a big motion vector accumulation is found, it is presumed that some substantially meaningful movement (i.e., an effective movement) has been found in that image block. For example, any movement to which monitoring agents of video surveillance system worth paying attention, e.g., a person who is running, may be selectively detected. On the other hand, if any motion vector whose accumulation value for a specific time-period fails to be higher than the first threshold shall be ignored in detecting procedure under estimating that the change in video is rather small.
Referring to
Step (S210): First, it is identified a plurality of image blocks which are located adjacent around the image blocks which were marked as region of moving object in (S100). For convenience, they are referred to as ‘neighboring blocks’ in this specification. These neighboring blocks were not marked as region of moving object in (S100). In the procedure of
Steps (S220, S230): The values of motion vectors of the neighboring blocks are compared with a predetermined second threshold (e.g., 0). Then, some of the neighboring blocks whose motion vector is higher than the second threshold shall be marked as region of moving object. It shall be reminded that substantially effective movements have been confirmed in the regions of moving object in (S100). Therefore, if some movement is found in image blocks which are located adjacent around a region of moving object, when considering the characteristics of shooting video (e.g., CCTV video), the image blocks are likely to be a single lump with the adjacent region of moving object. That is why these neighboring blocks are also marked as region of moving object.
In one embodiment, each of the neighboring blocks is examined and is marked as region of moving object when its motion vector is higher than a predetermined second threshold (e.g., 0). In other embodiment, each of the neighboring blocks is examined and is marked as region of moving object when the motion vector accumulation of (S100) is higher than a predetermined third threshold (e.g., 5). It is logical that the second threshold and the third threshold shall be smaller values than the first threshold.
Step (S240): Further, some of the neighboring blocks whose coding type is Intra Picture shall be marked as region of moving object. Because the motion vector is unavailable for Intra Picture, it is impractical to examine the neighboring blocks of Intra Picture based on motion vector as in (S220) and (S230). In this case, it is preferable to let the neighboring blocks of Intra Picture be marked as region of moving object, so as to let the neighboring blocks of Intra Picture to form a single lump with the adjacent image blocks which have already been marked as region of moving object. The loss when one image block is wrongly marked as region of moving object is small, whereas the loss when the region of moving object is fragmented is big.
The present invention may provide an advantage of lowering the computer resource required for analyzing a compressed video (e.g., CCTV video) by selectively performing image analysis. Especially, the present invention may provide more or less 5 times more channel capacity than conventional video analysis systems because the present invention requires more or less ⅕ less computing resources in analyzing a compressed video.
Meanwhile, the present invention can be implemented in the form of a computer-readable code on a non-transitory computer-readable medium. Various types of storage devices exist as the non-transitory computer-readable medium, such as hard disks, SSDs, CD-ROMs, NAS, magnetic tapes, web disks, and cloud disks. The codes may be distributed, stored, and executed in multiple storage devices which are connected through a network. Further, the present invention may be implemented in the form of a computer program stored in a medium in order to execute a specific procedure by being combined with hardware.
Number | Date | Country | Kind |
---|---|---|---|
10-2021-0154880 | Nov 2021 | KR | national |