1. Field of Invention
The present invention relates to high-definition video surveillance systems.
2. Background
In high definition (HD) video surveillance systems, typically one video recorder is connected with multiple cameras via cable networks. The data flow from camera to video recorder is called downstream, while the data flow from the video recorder to camera side is called upstream. In the video recorder, the downstream videos from cameras that capture live scenes in the field of view of cameras are displayed instantly to monitor and also recorded for future playback. The videos for instant displaying are called live-view videos and the videos for recording are called recording-view videos respectively. In some systems, the live-view video and recording-view video are two different video streams. In other systems, the same video stream is used for both live-view and recording-view.
The video recorder often uses an HD monitor to display the live view videos from multiple cameras on a single screen, with each video occupying a small area of the whole monitor screen. This monitor is called primary monitor and its screen is called primary screen. The video displaying area on the screen is called video window. In the video surveillance industry, it is common to display 4, 5, 6, 9, 10 or 16 videos on a single screen simultaneously as shown in
Consider a common example application of conventional FHD video surveillance system where 16 FHD cameras are connected to a FHD video recorder with 1 FHD monitor, and all 16 FHD source videos from 16 cameras are sent to video recorder and displayed on the FHD screen of its monitor.
In usual situations, the operator needs to see the 16 FHD video source equally simultaneously in the 16-split screen on the single FHD monitor. Each source video has FHD resolution while it is to be displayed in a video window of 1/16 FHD resolution. Clearly, the FHD source video cannot fit into a displaying video window of 1/16 FHD. Therefore, each FHD source video is always downscaled and/or cropped into a displaying video of 1/16 FHD and accordingly all 16 FHD videos are combined into 1 and then displayed on the FHD monitor screen.
In some situations, the operator needs to see the details of one or several selected source videos in video windows larger than 1/16 FHD. In order for the combined video to fit into same monitor screen, some other source videos need to be displayed in smaller video windows. It is still true that each FHD source video cannot fit into a displaying video window of different size. Therefore, each FHD source video is still downscaled and/or cropped into a displaying video of different size. Accordingly, all 16 FHD videos are combined into 1 and then displayed on the FHD monitor screen.
In some other situations, the operator needs to see the full details of one selected FHD source video on the whole FHD monitor screen. This is called full screen display. Each pixel of the selected source video is displayed as one pixel on the screen, all FHD pixels in the selected source video are displayed on the FHD monitor screen. However at same time, all other 15 FHD source videos are not displayed at all.
It can be seen from above example application that under various situations, although 16 FHD source videos are carried from cameras to video recorder, only 1 FHD display video with total 1920×1080 visible displayed pixels is produced and displayed for live view monitoring.
In order to meet the operator's requirements in various situations, the video recorder needs to have the capability of displaying the source video from each camera at varying resolution and size, from coarse resolution used to provide a whole view in a miniature sized video window on split screen mode to full resolution as used in full screen mode. To achieve this capability, the conventional system carries source video at full resolution and full size from each camera to video recorder. When the source video reaches the video recorder, the video can be downscaled and displayed on the screen with the desired resolution, such as 16-split view or full screen view, by cropping, scaling and filtering the source video.
Two conventional systems exist, which can achieve the aforementioned goals: one is the HD-SDI (High Definition Serial Digital Interface) camera based systems and the other is the HD IP (Internet Protocol) camera based systems. In both systems, each camera transmits video with full resolution from the camera side to the monitor side. The HD-SDI systems transmit uncompressed or lightly compressed high definition videos to video recorder without IP packetizing. Contrarily, the HD IP cameras transmit heavily compressed high definition videos over IP to video recorder.
Both of these systems have their own advantages and disadvantages. Since uncompressed or lightly compressed HD video is transmitted, the HD-SDI system can achieve near-zero latency lives view, which is important for time sensitive applications. However, the HD-SDI system requires huge bandwidth in video transmission and heavyweight video compression for recording and/or IP packetizing for internet access in video recorder. In IP camera based systems, while IP video is well suited for recording and internet access, the IP video is essentially not suited for live view monitoring because of the high computational cost of heavyweight decompression and the latency resulting from video compression/decompression and IP traffic handling.
There is also a common difficulty in the two systems when modern video surveillance systems migrate to high definition. As is well known, it requires a lot of computation power and hardware resources to compress or decompress and display each HD video. Further in both systems, as each camera carries an HD video to the video recorder, it becomes computationally costly for the multi-channel video recorder to compress or decompress and display a large number of HD videos, e.g., 16 HD videos, simultaneously.
Accordingly, it is desirable to combine the advantages of both systems, that is, to provide up to full resolution and full screen video with near-zero latency for liver-view in addition to IP video well suited for recording. It is also desirable to resolve the problems of HD live-view for multiple video cameras with low cost.
The present invention presents a smart dual-view video surveillance system, which carries a smart live-view video and a dedicated recording-view video from each camera to the video recorder. The smart live-view video only carries video data for the visible displayed pixels in its displaying video window and is dedicated for live-view monitoring. The dedicated recording-view video carries the complete video data and is dedicated to video recording and playback.
In one aspect of the present invention, the system converts each source video to displayed video at camera side and only transmits the portion of video that is visible in the displaying window on the monitor screen while at the same time each video is capable of being displayed in full screen full resolution as is in existing systems.
Compared to the conventional HD-SDI based live monitoring system, the present invention reduces the total bit rate significantly required for the live-view videos from all cameras. Consider the example application above, the total number of visible displayed pixels on the monitor from all 16 cameras combined is no more than 1920×1080. Therefore, the smart live-view video only needs to carry at most 1920×1080 pixels per frame from all 16 cameras combined. In comparison, the 16 channel HD-SDI system needs 16 times the bit rate to achieve the live view monitoring.
Compared to the conventional IP based live monitoring system, the present invention reduces the decoding cost significantly required for the live-view videos from all cameras. Consider the example application above, the total number of visible displayed pixels on the monitor from all 16 cameras combined is no more than 1920×1080. Therefore, minimally 1 FHD lightweight decoder is enough to uncompress all smart live-videos from all 16 cameras. In comparison, the 16 channel IP system needs 16 heavyweight video decoders to uncompress 16 source videos from all cameras to achieve the live view monitoring. On the recording-video side, since only playback of recording video needs heavyweight decoder, and operator is used to traces back one video at a time, one heavyweight decoder may suffice the whole smart dual-video recorder. The bottleneck of video decoder in conventional IP system is thus eliminated.
In another aspect of the present invention, a method is provided to enable transmission of only the visible displayed video. A smart live-view processor uses a reliable protocol to constantly obtain from the smart video recorder the parameters such as the pan offset, tilt offset, zoom ratio, the displaying video window size, and the visibility of pixels in displaying video window. Based on such information, neither the full resolution full size frame in source videos from most cameras nor the invisible portion of the video in the displaying video window is carried in smart live-view video from the camera side to the monitor side.
In another aspect of the present invention, the live-view video uses lightweight or no compression. The live-view video can further be carried over layer 1 or 2 of OSI (Open System Interconnection) model, below IP layer to remove IP-related complexity and latency, ease installation and improve display quality. In contrast, the recording-view uses heavyweight compression to reduce the transmission data rate. The recording-view video is often carried over IP.
In another aspect of the present invention, video decoding resource in the video decoder can be shared between all the camera live-view videos and thus reduce the system cost.
The principle and embodiments of the present invention will now be described in detail with reference to the drawings, which are provided as illustrative examples so as to enable those skilled in the art to practice the invention. Notably, the figures and examples below are not meant to limit the scope of the present invention to a single embodiment but other embodiments are possible by way of interchange of some or all of the described or illustrated elements. Wherever convenient, the same reference numbers will be used throughout the drawings to refer to same or like parts. Where certain elements of these embodiments can be partially or fully implemented using known components, only those portions of such known components that are necessary for an understanding of the present invention will be described, and detailed descriptions of other portions of such known components will be omitted so as not to obscure the invention. In the present specification, an embodiment showing a singular component should not be considered limiting; rather, the invention is intended to encompass other embodiments including a plurality of the same component, and vice versa, unless explicitly stated otherwise herein. Moreover, applicants do not intend for any term in the specification or claims to be ascribed an uncommon or special meaning unless explicitly set forth as such. Further, the present invention encompasses present and future known equivalents to the components referred to herein by way of illustration
In a certain embodiment, the video surveillance systems may require the video recorder to display videos in the popup window, which displays the selected video on the top of other video windows. Such a video recorder supports multiple display layers. A video window in the upper display layer has some overlapped areas with the video window in the lower layers. The popup window size is not limited by the split-screen pattern and thus can vary from small thumbnail size to large full-screen size.
According to the principle of the present invention, the smart live-view video carries video data only for the visible displayed pixels in its displaying window. As shown in
In the example of
In some embodiment, one or several videos are displayed on other secondary monitors with full resolution. However, in most large scale systems, it is not necessary to display all videos in full screen full resolution.
Thus, the total number of visible video pixels on the screen is much less than that of all full resolution videos combined. For example, for 16 cameras with the resolution of 1920×1080, the monitors may only need to display 2×1920×1080 pixels on the screen for all videos, much less than 16×1920×1080 pixels contained in a frame picture of all 16 cameras combined.
Note that the present invention also transmits a recording view video under advanced heavyweight compression, whose compression ratio can reach 100:1 or even 200:1. The combined bit rate of the live-view video and the recording view video is still significantly lower than that of the traditional HD-SDI systems, where HD videos of all cameras are carried from the cameras to the monitor.
Compared to IP camera based systems, the present invention has the advantage of low latency, easy installation and better video quality. Firstly, the smart live-view video of the present invention preserves the low latency feature of lightly compressed video. The long latency in heavily compressed recording-view is well acceptable for video recording and playback video monitoring. Secondly, since the smart live-view video is not transported over IP, no complicated IP configuration and handling is required. And plug and play can be easily supported by the live-view video. These features significantly alleviate the difficulty of installation and trouble shooting. Thirdly, due to the burst nature of IP traffic and the presence of external interference, extra packet delay resulting from packet loss is common in IP video streaming. The extra delay may cause video freeze and/or video jump in the conventional network video surveillance system which relies heavily on the compressed IP stream to recover the live-view video. The video quality issues caused by IP traffic jittering are eliminated in the smart live view of the present invention, which uses no IP protocol in a certain embodiment. The play back video can have desirable quality as long as the all necessary IP packets are finally delivered and stored.
The recording-view processor 430 can send signal 432 to control the lens system. The control signal 432 may include auto-focus control, iris control and PTZ (Pan-Tilt-Zoom) control. Some control signal, such as PTZ control signal, may be originated from the SVR and may be carried over the IP packets or the separate wiring such as RS 485.
The raw digital video 421 also enters the smart live-view processor 450, which produces a live-view stream 451 carrying video data only for the visible displayed pixels in its display window according to the control signal 462. The SVR 340 is aware of the resolution, size, and visibility of the displayed video on the monitor screen. Such information is sent back to the smart live-view processor 450 via the return path of the communication channel. Cropping, scaling, masking and other techniques are applied to the raw digital video 421 to obtain a video comprising only the visible displayed pixels in the display window on the monitor screen.
In a certain embodiment, the recording-view video is streamed over IP and the live view stream is not. Thus, a hybrid camera-side modem 460 is required to carry both streams. The recording-view video and the live-view video are multiplexed together with other downstream information by the camera-side modem 460 and sent to SVR 340 over communication channel 461. Meanwhile, the camera-side modem 460 also receives the upstream signal for the recording view stream and the smart live-view controlling signal 462. Modem 460 further de-multiplexes and sends these upstream signals to the recording-view processor 430 or the smart live-view processor 450. In certain embodiment, some information included in smart live-view controlling signal may be carried over separate physical communication link. For example, the ePTZ control signal can be carried over the conventional RS-485 cable.
The details of an embodiment of the smart live-view processor 500 are shown in
To reduce the required bit rate, we can remove the video content at the output of the ePTZ controller 520 in the invisible area of the displayed video. This is achieved by the live-view masking block 530. In
For the ePTZ and the masking function to work properly, the smart live-view processor 450 needs a reliable protocol to constantly obtain from the SVR the parameters such as the picture width 614 and height 615, offset 612 and offset 613, the zoom ratio of the source picture size and the displaying video window size, and the information of the invisible area. Based on such information, the information of the invisible video pixels in the display window is not carried from the camera side to the monitor side, and thus the present invention significantly reduces the total bit rate and the decoding complexity.
The modem 710, 730 may also exchange data with the computer system 790 of the SVR via signal 713, 733. The recording view video traffic from all cameras is sent to the computer system 790 for IP protocol handling and video recording. In a certain embodiment, the live-view video can also be sent to the computer system 790 and saved in the backup storage system. On the other hand, the returning IP packets and some control signal such as PTZ information may be sent to the modem 710, 730. Further, the information for live-view video, such as window position, visible area and window size, are collected from the displayer controller 750; and the computer system sends such information to the modems according to the protocol known to both the camera and the SVR.
The display controller is responsible for generating monitor display by combining the output of the live-view decoder 720 for live-view, the output of the digital video decoder 760, 761 for playback view, and the graphics signal 751 for computer graphics. The computer graphics may include company logo, text, user control buttons, split view mosaic, etc. The digital decoder 760, 761 decodes the recording-view videos stored in the computer system 790 for playing back previously stored scenes. Since the recording-view may be in HD format, the digital decoder 760, 761 are the costly HD decoders. With a powerful display controller, live-view video, playback view, and computer graphics can be combined and displayed in the primary and/or the secondary monitors in various forms with various features, such as split view, popup view and alpha compositing.
The display controller also maintains the video window information, such as video position, video size, overlapped area, visibility of the overlapped area and video zoom ratio. Such information is usually generated by the computer system according to the input from the operator. In some embodiment, touch screen monitor is used and the display controller obtains the video window information from the monitors 741 and 742. The video window information is processed by the computer system and sent back to the camera side according to the predefined protocol. Each camera receives its own display window information and generates the live-view video accordingly.
The present invention is described according to the accompanying drawings. It is to be understood that the present invention is not limited to such embodiments. Modifications and variations could be effected by those skilled in the art without departing from the spirit or scope of the invention as defined in the appended claims.
This application refers to the prior provisional application under application No. US/61,717,985 filed on Oct. 24, 2012.