This invention relates to video systems allowing simultaneous viewing of live and recorded video content.
Prior art references considered to be relevant as a background to the invention are listed below and their contents are incorporated herein by reference. Acknowledgement of the references herein is not to be inferred as meaning that these are in any way relevant to the patentability of the invention disclosed herein. Each reference is identified by a number enclosed in square brackets and accordingly the prior art will be referred to throughout the specification by numbers enclosed in square brackets.
Video monitoring is widely used in surveillance systems. Its main objective is to provide monitoring of activities in the relevant site. Surveillance video can be viewed in live video, which in the context of the invention as defined by the appended claims means displaying video frames in real time, with negligible delay from the time of recording. Video can also be viewed off-line after it was recorded, in what is called playback video.
Almost every guard with surveillance cameras will face the following dilemma when an event occurs such as an intruder alarm being triggered. Should he watch the live video to see what the intruder is doing now, or should he watch the playback video in order to see what has the intruder already done? The dilemma of live vs. playback is so prominent that many large monitoring centers, which are operated by trained experienced guards, use multiple video screens: some screens for playback video and some screens for live video. But what can be done when only one screen is available?
This problem is addressed in the art. For example, GB 2 326 049 [7] discloses a video surveillance system in which live and previously recorded images may be simultaneously displayed. The surveillance system comprises a plurality of video cameras, a monitor and a video recorder. The video cameras and monitor are controlled by multiplexers that can display multiple cameras on one monitor and also send the information from several cameras to the video recorder using time division multiplexing (TDM). The recorded images are played back simultaneously with the ongoing monitoring of live images, without interrupting the on-going recording of new images. In such an arrangement, the live and playback videos are displayed in separate dedicated areas of the monitor, each of which is associated with a different time. Thus, while they are displayed simultaneously on the same monitor, they do not form a composite video sequence that shows spatial and temporal progress of an object in a single video sequence.
WO2010076268 [8] discloses a digital video recording and playback apparatus having one or more receivers for receiving media content from one or more sources external to the apparatus. The received media content is stored and combined contemporaneously with live content received by one of the receivers. For example, live topical information can be obtained from an external source or sources such as Internet feeds, transmitted metadata or live topical information and overlaid on programs or inserted between programs.
Such an arrangement allows auxiliary video information to be superimposed or montaged on a live feed as is well-known, for example in TV weather forecasts where the forecaster presents a live feed during which pre-recorded content is displayed. However, there is no suggestion to superimpose on to the live feed, content that was itself part of the live feed but no longer is; or which is part of the current live feed but whose movement is of interest.
U.S. Pat. No. 8,102,406 [3] discloses a method and system for producing a video synopsis which transform a first sequence of video frames of a first dynamic scene to a second sequence of at least two video frames depicting a second dynamic scene. A subset of video frames in the first sequence is obtained that show movement of at least one object having a plurality of pixels located at respective x, y coordinates and portions from the subset are selected that show non-spatially overlapping appearances of the at least one object in the first dynamic scene. The portions are copied from at least three different input frames to at least two successive frames of the second sequence without changing the respective x, y coordinates of the pixels in the object and such that at least one of the frames of the second sequence contains at least two portions that appear at different frames in the first sequence.
The output of this approach is a composite video sequence whose frames include dynamic objects whose movement is depicted in the output video sequence. Object that appeared at different times in the input video will be shown simultaneously in the output video. The objects will be superimposed over a background taken from the input sequence.
WO2006/048875 [9] discloses a method and system for manipulating temporal flow in a video. A first sequence of video frames of a first dynamic scene is transformed to a second sequence of video frames depicting a second dynamic scene such that in one aspect, for at least one feature in the first dynamic scene respective portions of the first sequence of video frames are sampled at a different rate than surrounding portions of the first sequence of video frames; and the sampled portions are copied to a corresponding frame of the second sequence. This allows the temporal synchrony of features in a dynamic scene to be changed.
Reference is also made to C. Stauffer and W. E. L Grimson [4], which discusses a video processing method to distinguish between dynamic objects that move relative to a static background. Each pixel is modeled as a mixture of Gaussians and an on-line approximation is used to update the model. The Gaussian distributions of the adaptive mixture model are then evaluated to determine which are most likely to result from a background process. The dynamic objects are extracted by subtracting the background.
T. Ko, S. Soatto, D. Estrin [5] also uses background subtraction for distinguishing between dynamic objects that move relative to a static background.
Background subtraction is very efficient in detecting and segmenting moving objects. More recently, methods based on Neural Networks were suggested for object segmentation [10, 11]. These methods can accurately segment objects from a single image or a single frame, and also identify the object class. Since only a single image is used, these objects can be either moving or stationary.
It is an object of the invention to provide a video-processing method and system, which superimpose playback objects, corresponding to the image appearance of previously appearing objects, on to a live video sequence. The live video is captured substantially in real time and itself may include both static and dynamic objects.
This object is achieved in accordance with the invention by combining visual information from live video feed with playback objects. This is done by selecting objects extracted from earlier times of the video, and combining them with the live video feed. This results in a single combined video that displays objects that appeared in the past together with live video.
Thus in accordance with an embodiment of the invention there is provided a computer-implemented method for displaying live video frames of a current scenario captured by a video camera together with playback of previously captured objects, the method comprising:
One possible application of such a method is a video surveillance system, where a video camera captures a current scene for displaying live feed on a monitor. In the event of a security event triggered, for example, by an intrusion, previously captured objects are inserted into the live feed so as to allow progress of the captured objects to be displayed without interfering with the ongoing video capture and display of the current scene.
In such an application, the previously captured objects of predetermined characteristic are typically moving and are preferably inserted into the current video frame at the same locations from which they are extracted from the previously captured frames. However, objects of different characteristics may be identified. For example, the invention may be used to track a stationary vehicle found at a crime scene, in which case there will be many frames where the vehicle is motionless. Optionally, the respective times associated with the captured objects are displayed either alongside the objects or when selected e.g. using a computer mouse or other pointing device so that the progress of such objects can be clearly viewed in correct spatial orientation within the current scene.
For the sake of clarity and abundant caution, we use the term “live” and “real time” to denote video images that are captured continuously. In any video system where video frames are captured and buffered prior to being displayed, there is always a small and negligible delay between video capture and its subsequent display. In the present invention, the live video frames are buffered and at least some frames are processed in order to stitch playback objects. This need not impose a significant delay since the video frames are also continually processed to identify and store predetermined objects that may be subsequently extracted and stitched into a buffered live frame prior to its being displayed. Indeed, with currently available computing power, objects can be identified in the same interval that a frame is displayed in real time and the thus identified objects can be stored for subsequent playback. Furthermore, if the computation speed is sufficiently high, for example the process of object extraction is 60 times faster than the video frame rate (i.e., 60 min of video can be processed in 1 minute) it is possible to apply the processing of past video in parallel with the triggering effect, with minimal latency.
This clarification is also pertinent in distinguishing the invention over known video synopsis such as described in [3] and [6] and since many of the computational techniques employed in video synopsis may be used in the present invention, it is appropriate to emphasize where the two approaches differ.
Video synopsis [3, 6] processes stored video frames, identifies dynamic objects and creates a new output video at least some of whose frames contain multiple instances of selected dynamic objects each taken from different frames of the input video and therefore captured at different times. The output video thus shows motion of objects that occurred in the past as does the present invention. But the output frames of the video synopsis do not, or at least need not, include any other meaningful features since their only purpose is to show the progress through space and time of objects that typically appeared in the past. As opposed to this, while the present invention in one aspect, also seeks to display moving objects that appeared in the past, the output video of the present invention must continue to show objects that are currently being captured.
It is important to note that while an important application of the invention relates to surveillance this is not its only application and the invention may find general application wherever the historical appearance of an object in a live stream is to be shown in real time as part of the live video. One such example could be a nature program that shows a snake hiding in the sand with its camouflaged eyes slightly protruding awaiting prey. The live feed might show an unsuspecting lizard that passes by and in response to which the snake jumps into visibility and devours the lizard. The narrator may want to display this amazing feat together with historic progress of the snake so as to show on the live feed where, for example, the location of the snake's head was in previous frames prior to its suddenly emerging from the sand.
The invention is best summarized with reference to
In a preferred embodiment, playback objects (objects extracted from the playback video) should be positioned inside the live video while minimally obscuring the objects of the live video. This may be done by detecting the initial appearance of live objects, predicting their possible future path, and avoiding or minimizing the overlap between inserted playback objects and the predicted future path of objects in the live video. One possibility for the prediction of the path of live objects is to collect statistics about the path taken by objects that appeared earlier, and selecting for each live objects a historic object that was at a similar location with similar properties as the live object (e.g. speed, appearance, etc.). The path taken by that historical object can be used to estimate the future path of the live object. Once a predicted path exists for a live object, the playback object can be placed in the time minimizing the overlap between the inserted playback object and the predicted paths of the live objects. Methods for such placement are described in the video synopsis patents [3, 6].
There may be instances where the position of an inserted previously captured object (playback object) is the same or significantly overlaps an object in the current (live) video. There may be cases where this is acceptable, but if not, the transparency of one or both objects can be adjusted so as to allow simultaneous viewing of both objects. Alternatively, one of the objects can be displayed in monochrome or even as an icon without obscuring the other.
The invention is distinguished over hitherto-proposed systems where live and playback segments are displayed on separate screens or in independent areas of the same screen in the following respects. Both the live and playback videos observe the same scene and playback objects are placed in the output video at the same scene locations they originally appeared. In a preferred embodiment, the live video frames are played as the background, even before objects are extracted from them and inserted into this background. In order to prevent collision/overlap between live and playback objects, even without object extraction on the live video, we can detect the location of live objects in a couple of frames, and estimate their future trajectory based on objects that appeared previously in the scene.
We can distinguish between live objects and playback objects in several ways, two possibilities being: (i) keep the live in color, and turn the playback to monochrome; (ii) display a time stamp in association with the playback objects. But many other methods to distinguish live from playback objects are possible.
Playback objects can have many flavors: (i) video synopsis of a predefined past, say video synopsis of the last 1 hour; (ii) the video is played backward in time starting from the time we activate the display. In this case, the live objects will move forward, while the playback objects will move backwards. (iii) Any other selection of objects from the past.
In order to understand the invention and to see how it may be carried out in practice, embodiments will now be described, by way of non-limiting example only, with reference to the accompanying drawings, in which:
By way of example consider the following scenario: in a monitored facility with many corridors, rooms, and cameras, some of the rooms have limited access. An intruder enters a restricted zone through a door monitored by a surveillance camera, and vanishes quickly in one of the inside rooms without being observed by the guard. Following an intruder alarm the guard faces two tasks: (1) provide to other guards the description of the intruder, and (2) check when the intruder leaves the restricted zone, i.e., exits via the same door. The first task requires watching playback video, while the second requires watching live video.
The stitching process performed by the stitching module 260 can be implemented in various ways. An object can be placed into the live video as is, replacing the pixel values at the respective location in the live video. Other methods for seamless blending can be used such as alpha blending, pyramid blending [1, 2], gradient domain blending, etc. The resulting video stream is displayed on a monitor 270 contains information both from the current live video and objects from the past that appeared in the recorded video. As in video synopsis [3, 6], events that occurred at different times are presented simultaneously.
There are important differences between the embodiments depicted in
Note that in both cases with and without motion detection, the extracted objects are preferably displayed in their original motion i.e. direction in contrast to regular backward video playback, where object motion is the reverse of the original motion. For example, if a captured video contains a person walking from left to right, the present invention may likewise display his walk from left to right as well, even when going back in time.
The workflows depicted in the figures are not limited to display on a local device. Thus,
As best seen in
Since block replacement suffices to stitch objects in the compressed domain, there is no need to re-compress the resulting video and the compressed output video can be sent to the display device 1099.
It will be appreciated that modifications may be made without departing from the invention as claimed. Specifically, the invention is not limited to intrusion detection or use by guards, but can be applied whenever it is desired to understand quickly what happened. This is a desired functionality in any situation awareness system. The description of the above embodiments is not intended to be limiting, the scope of protection being provided only by the appended claims.
It will also be understood that the system according to the invention may be a suitably programmed computer. Likewise, the invention contemplates a computer program being readable by a computer for executing the method of the invention. The invention further contemplates a machine-readable memory tangibly embodying a program of instructions executable by the machine for executing the method of the invention.
It should also be noted that features that are described with reference to one or more embodiments are described by way of example rather than by way of limitation to those embodiments. Thus, unless stated otherwise or unless particular combinations are clearly inadmissible, optional features that are described with reference to only some embodiments are assumed to be likewise applicable to all other embodiments also.
Without derogating from the above generalizations, the inventive concepts encompassed by the invention include the following:
Inventive concept 1: A computer-implemented method for displaying live video frames of a current scenario captured by a video camera together with playback of previously captured objects, the method comprising:
The instant application claims priority as a non-provisional of U.S. Provisional Application Ser. No. 62/711,079, filed on Jul. 27, 2018, presently pending, the contents of which are incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62711079 | Jul 2018 | US |