The present invention generally relates to a video surveillance system and video information annotation and de-annotation technologies thereof.
The early video surveillance system uses analogue monitors with video cassette recorder (VCR), and manually finds and reviews the proper recorded tape after a certain event occurred. The current video surveillance system, on the other hand, is digitized. The IP camera is used as a monitor, and surveillance video images are transformed into digital information and transmitted through the network to the system back-end. The system back-end replaces the tape-based VCR with the digital video recorder (DVR) for accelerating information search and having more convenient data storage. Because of the digitization on image extraction and image storage, the application of security surveillance is widely promoted. However, in the event processing, only time index is used for manual elevation after an event occurred. If the image content can be identified and analyzed during the image extraction process, and the image information can be integrated with the information of the external sensor and annotated in the image content, the subsequent processing and search may be based on time index, event, image content, or personage. Furthermore, the system may directly determine the occurrence of events, issue warning messages, automatically record the images and execute subsequent processing.
U.S. Pat. No. 6,928,165 disclosed a communication system using a multiplexer and time division or frequency division schemas to transmit image and add-on information respectively through a transmission interface, and adding digital watermark in the image to describe the relation between image and the add-on information. However, the image and the add-on information are not integrated into a format identical to the original image for transmission. Therefore, an additional synchronization mechanism is required for the add-on information and the image.
U.S. Pat. No. 6,952,236 disclosed a technique of conversion of text embedded in a video stream by hiding the information in the time duration of a line break for the scanning lines. After the information is extracted, a converter converts the text data format to a format matching with the European or American system. Such text embedded architecture may be applicable to the scanning display systems. After the data compression, the hidden information no longer exists; therefore, the technique is not applicable to the current video surveillance systems based on IP Cam, and the like.
U.S. Pat. No. 7,050,604 disclosed related methods by using watermark to embed the object specific information into video. After transmission, the video information embedded with watermark is decoded to obtain the separate video information and object information. The document describes neither the image compression and decompression process nor how to guarantee the data integrity of the obtained object information.
Among the aforementioned and current security surveillance systems, some systems only include image extraction capability at the front-end, and transmit the additional information to the back-end for processing. In this manner, the image and the data are transmitted separately. Therefore, it requires additional data transmission interface, and the data and the image must be synchronized. Thus, the system complexity increases. Some systems annotate the text information onto the image directly, such as system time, location. This type of annotation information is fixed, and the format of the annotated image has been changed or it may not be recovered as the original image.
In an exemplary embodiment, the disclosed is directed to a video surveillance system, comprising an image feature extraction module, an image information annotation module, an image compression module, an image decompression module, and an image information de-annotation module. The image feature extraction module captures at least an image as the original image, and extracts the feature information of the original image. The image information annotation module embeds the annotation information including at least the feature information to the original image, and converts the embedded image into the same format as the original image. The annotated image is compressed by the image compression module. The image decompression module decompresses the compressed image, and image information de-annotation module extracts embedded information from the decompressed embedded stream. After the error recovery decoding and image processing, the object image and annotation information are separated.
In another exemplary embodiment, the disclosed is directed to an image information annotation module, comprising an error recovery processing unit, an image processing unit, and an annotation unit. The error recovery processing unit encodes the annotation information into embedded information through an error recovery encoding method and a threshold computation. The image processing unit processes the original image to compute the capacity for the embedded information. The annotation unit embeds the L encoded embedded information into the original image, where L is a multiple of the embeddable information. The embedded image is converted into another image of the same format as the original image.
In another exemplary embodiment, the disclosed is directed to an image information de-annotation module, comprising a de-annotation unit and an image processing unit. The de-annotation unit extracts the embedded information from the embedded stream, and obtains de-annotation information through error recovery processing. The de-annotation information is the intact recovery of the annotation information. The image processing unit may separate the image and the de-annotation information.
Yet in another exemplary embodiment, the disclosed is directed to a video annotation method, comprising: embedding annotated information including at least the feature information of an original image to the original image; and converting the embedded image into the same format as the original image.
Yet in another exemplary embodiment, the disclosed is directed to a video de-annotation method, comprising: extracting embedded information from an embedded stream; obtaining de-annotation information from the extracted embedded information where the de-annotation information being the intact recovery of the annotation information; and obtaining video information and environmental parameter information of an object image through image processing.
The foregoing and other features, aspects and advantages of the present invention will become better understood from a careful reading of a detailed description provided herein below with appropriate reference to the accompanying drawings.
The present disclosed embodiments may provide a technique for video annotation and de-annotation, and a video surveillance system for embedding information in the video stream. The video annotation technique adds the additional information, such as image feature information and annotation information from the external sensed signal, to the original image, and converts to an image of the same format as the original image. The video de-annotation technique may cooperate with the current image processing technique to recover from the annotated image to the original image and guarantee the integrity of the annotation information.
According to the exemplary embodiments disclosed in the present invention, the video information annotation technique is to add the additional information into the original image and converts the embedded image into the same format as the original image for subsequent processing of video surveillance systems. The additional information may be object features, movement or relation from the original image, the environmental parameters obtained from the external sensor, and so on. Because the video format is not changed, the subsequent processes, such as compression, transmission, decompression and storage, will not be affected. The video information de-annotation technique utilizes the matched de-annotation and decoding techniques of the system front-end to obtain the annotation information from the image, and utilizes the error recovery encoding technique and threshold scheme to perform fast recovery and guarantee the integrity of the de-annotated annotation information.
Image extraction module 110 captures at least a video source to obtain an original image 110a and extracts feature information 110b of original image 110a, such as the location of moving object, whether human intrusion or not, or the feature parameters of the intruding person, and so on. The data amount of feature information 110b is far less than the data amount of original image 110a. The extraction may be accomplished by image processing or recognition and identification schemas. Image information annotation module 120 embeds annotation information 120a into original image 110a. Annotation information 120a includes at least feature information 110b. For example, annotation information 120a may include feature information 110b and environmental parameter information 111. Image information annotation module 120 converts the embedded image into another image of the same format as original image 110a, i.e., embedded image 120b. Because the image format is not changed, embedded image 120b may be compressed by image compression module 130, or streamed to remote site or storage devices.
To obtain the original information, image decompression module 140 may decompress compressed image 130b, and image information de-annotation module 150 may extract the embedded information from decompressed embedded stream 140b. After error recovery decoding and image processing on the extracted embedded information, object image 150a and obtained de-annotation information 150b are separated.
For video surveillance system 100, environmental parameter information 111 refers to the information of the externally sensed signals, such as from the external sensors, CO, CO2 detectors, humidity/temperature meter, photo-sensor, radio frequency identification (RFID), timer, flow meter, and so on, to obtain the information of the externally sensed signals. The information may be time, temperature, humidity, or other data. The data amount is far less than the data amount of the original image.
Environment parameter information 111 and feature information 110b may be combined and encoded to become encoded embedded information. The encoding techniques may be error recovery encoding technique, such as Hamming code, BCH Code, Reed-Solomon Code, Reed-Muller Code, Convolutional Code, Turbo Code, Low Density Parity Check (LDPC) Code, Repeat-Accumulate (RA) Code, Space Time Code, Factor Graphs, Soft-decision Decoding, Guruswami-Sudan Decoding, Extrinsic Information Transfer Chart (EXIT Chart), Iterative Decoding, and so on. The encoded embedded information may be embedded in the image stored in the video buffer to become another image with annotation information. The embedding technique may be Non-destructive Electronic Watermark, On Screen Display (OSD) or covered in a specific image area.
The length |i| of the embedded information may be obtained through the above encoding techniques. Based on the original image, through visually Non-destructive method, it is able to estimate the embeddable information capacity |C| that the intact information may be guaranteed after embedding the information in the image and compressing the embedded image. First, the network status n at that time is computed, including network latency, loss rate, and so on. Also, the compression ratio R of the compression module is computed. The threshold scheme is used to compute an embedded information multiple L, where L=f(|C|, |i|, n, R). In other words, multiple L depends on embeddable information capacity |C|, embedded information length |i|, network status n and compression ratio R. L copies of the embedded information are embedded into the image, and then the intact embedded information may be recovered after decompression. It may ensure the data amount L*|i| is less than capacity |C| through the following exemplary schemas. That is, by reducing the important information or items of sensed information, which is a part of the application technical specification, or performing data division and embedding in a series of continuous images.
Because the image format is not changed by the annotation information, the compression module may compress the embedded image directly and stream to remote sites or storage devices. For example, the annotated image may be compressed as Motion JPEG or Mpeg-4 format, and transmitted through network or specific transmission path to the system back-end.
In real-time video surveillance systems, such as real-time playback, analysis, warning, etc., security surveillance system, in order to obtain the original information, the system back-end may decompress the compressed image. After decompression, the image may be restored to the original format, such as YUV, RGB, YCbCr, NTSC, PAL, and so on. After the compression-and-decompression process, the video information is not the same as the video information before the compression. The system may input index parameter to fast search for video information containing index parameters, and extract the original embedded information through de-annotation techniques matching the system front-end, such as electronic watermark. Through error recovery decoding techniques matching the system front-end, the annotated image feature parameters and environmental parameters may be obtained from the image.
If in non-real-time system, the system back-end may store the information in the database after obtaining Motion JPEG or Mpeg-4 format information. When an index search is requested later, the decompression and de-annotation techniques are used to compare against the object video information. Alternatively, after the system back-end decompresses the image files, the image files such as YUV, RGB, YCbCr, NTSC or PAL formats may be stored in the database. When an index search is requested later, the image of the original format embedded with image feature parameters and environmental parameters may be directly compared to obtain the target image information.
The aforementioned error recovery encoding technique and threshold scheme may compute the amount of backup information. The accurate embedding capacity computation and the error recovery may ensure the integrity of the de-annotated image feature parameters and the environmental parameters as well as the correctness of the index search parameters.
The significant difference between the present invention and the conventional electronic watermark technique is that the watermark technique is mainly for copyright identification, and only requires 70% identification rate to be effective. However, the embedded information according to the present invention is 100% recovered. Therefore, the present invention may make the de-annotation information extracted by system back-end same as the annotation information of the system front-end through error recovery processing.
Referring to
The image with annotation information is compressed by image compression module 130 as Motion JPEG or Mpeg-4 format, and transmitted through network. After receiving the image of the Motion JPEG or Mpeg-4 format, the back-end of the video surveillance system utilizes image decompression module 140 to recover the image to original image 110a format, such as YUV, RGC, YCbCr, NTSC, PAL, and so on.
After the compression and decompression process, the annotated image is partially distorted. Image information de-annotation module 150 utilizes an error recovery decoding technique matching the system front-end to separate an object image 150a from annotation information, obtains de-annotation information (I1′, I2′, . . . , In′), and ensures to fully recover to annotation information (I1, I2, . . . , In); i.e., I1=I1′, I2=I2′, . . . , In=In′.
Therefore, according to the video surveillance system of the present invention, the image may be annotated with image feature information and external sensed signals, and the annotation information may be fully recovered. Because the format of the image is not changed, the annotation information is not affected by subsequent compression and transmission. Therefore the video surveillance system of the present invention may combine with the current image processing technique and compatible with the current surveillance system. The system back-end may fast recover the annotated original image to provide fast image search for key event and improve the security surveillance efficiency.
Although the present invention has been described with reference to the exemplary disclosed embodiments, it will be understood that the invention is not limited to the details described thereof. Various substitutions and modifications have been suggested in the foregoing description, and others will occur to those of ordinary skill in the art. Therefore, all such substitutions and modifications are intended to be embraced within the scope of the invention as defined in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
097122273 | Jun 2008 | TW | national |