Slow or fast motion video using depth information

Information

  • Patent Grant
  • 11659135
  • Patent Number
    11,659,135
  • Date Filed
    Monday, October 19, 2020
    3 years ago
  • Date Issued
    Tuesday, May 23, 2023
    12 months ago
Abstract
Systems comprising a digital camera, an interface operable to mark a first entity in a frame of an input video stream and to determine a frame rate ratio FR1/FR2 between a first frame rate FR1 and a second frame rate FR2, a processor configurable to generate an output video stream of the digital camera, wherein the output video stream includes a first entity played at FR1 and a second entity played at FR2, and methods of using and providing same.
Description
FIELD

Embodiments disclosed herein relate in general to video generation and processing.


BACKGROUND

In known art, a recorded video stream is played with a sequentially constant frame rate (FR), with the option for the user to change the frame rate for all or some sequences of frames and to make these sequences appear in slow motion or time lapse. The slow motion or time lapse video streams are generated by a sequence of input frames that are played with a modified FR with respect to the FR used to capture the scene.


In highly professional setups such as the movie industry, there is an additional method, where the FR is controlled and modified only for some specific spatial information of the input frames. This is done mainly to highlight specific persons, objects or scenes, by playing the areas to be highlighted with a different frame rate than the rest of the frame.


For visual effects and improved user experience, it would be beneficial to have a system and method that generates the playing of areas to be highlighted with a different frame rate than the rest of the frame in an automated manner and under existing processing power constraints in devices such as smartphones or tablets.


SUMMARY

In various embodiments there are provided systems, comprising a digital camera, an interface operable to mark a first entity in a frame of an input video stream and to determine a frame rate ratio FR1/FR2 between a first frame rate FR1 and a second frame rate FR2, and a processor configurable to generate an output video stream of the digital camera, wherein the output video stream includes a first entity played at FR1 and at least one second entity played at FR2.


In an exemplary embodiment, the first entity is an object of interest (OOI) or region of interest (ROI) and the at least one second entity is selected from the group consisting of another object, an image foreground, an image background and a combination thereof.


In an exemplary embodiment, the output video stream includes at least one added entity played at a frame rate that is different from the first FR and the second FR.


In an exemplary embodiment, the given input stream includes at least one given entity played at a frame rate that is different from the first FR and the second FR.


In an exemplary embodiment, the interface is operable by a human user.


In an exemplary embodiment, the interface is operable by an application or by an algorithm.


In an exemplary embodiment, the OOI or the ROI is identified in at least a single frame of the input video stream with an object classification or an object segmentation algorithm.


In an exemplary embodiment, the OOI or ROI is tracked at least through a part of input video stream with a tracking algorithm.


In an exemplary embodiment, the processor is further configured to use a depth map stream that is spatially and temporally aligned with the input video stream to generate the output video stream.


In an exemplary embodiment, the depth map is used to determine a depth of each entity.


In an exemplary embodiment there is provided a method, comprising: in a digital camera configured to obtain an input video stream and to output an output video stream, marking a first entity in a frame of the input video stream, determining a frame rate ratio FR1/FR2 between a first frame rate FR1 and a second frame rate FR2, and generating the output video stream, wherein the output video stream includes a first entity played at FR1 and a second entity played at FR2.


In an exemplary embodiment, the method further comprises using a depth map to determine a depth of each entity.





BRIEF DESCRIPTION OF THE DRAWINGS

Non-limiting examples of embodiments disclosed herein are described below with reference to figures attached hereto that are listed following this paragraph. The drawings and descriptions are meant to illuminate and clarify embodiments disclosed herein, and should not be considered limiting in any way:



FIG. 1 illustrates an example video output provided by a method disclosed herein;



FIG. 2 shows a general flow chart of an exemplary embodiment of a method disclosed herein;



FIG. 3 illustrates respective frame rate masks and binned depth maps of a specific image for two different cases;



FIG. 4 presents an example of an image set of a scene containing a RGB image (left side), a depth map (center) and a derived SD (right side);



FIG. 5A shows a block diagram of an exemplary system used to run a method disclosed herein in a first example;



FIG. 5B shows a block diagram of an exemplary system used to run a method disclosed herein in a second example;



FIG. 5C shows an embodiment of a camera disclosed herein;



FIG. 5D shows an embodiment of a host device disclosed herein;



FIG. 6A shows a video RGB input stream and a video depth image input stream of the same scene;



FIG. 6B shows the FRM generated for the input video streams of FIG. 6A for case A;



FIG. 6C shows the FRM generated for the input video streams of FIG. 6A for case B;



FIG. 7 presents RGB images, depth maps and selected depth masks from frames related to the scene in FIGS. 6A-C that does and does not contain all object information;



FIG. 8A presents RGB images, depth masks and the depth information reconstruction process of case B for complete (first row) and incomplete (second row) depth mask information;



FIG. 8B shows case B selected depth masks and RGB image segments derived with these masks for complete (first row) and incomplete (second row) image information.





DETAILED DESCRIPTION
Definitions

“Entity”: a section or part of a RGB frame with information different from other sections or parts of the frame. Examples of such an entity are objects of interest (OOIs) or regions of interest (ROIs), as well as their respective foreground and background. The objects or regions of interests can be selected manually by the user or automatically by a dedicated algorithm.


“Assigned depth”: depth information on single pixels or segments of a RGB image which is obtained from a depth map that covers the same scene from the same (or similar) point of view (POV) as the RGB image.


“Selected Depth” (SD): depth of one or more selected objects in the RGB image.


“SD+”: depths that are further away from the camera than SD.


“SD”: depths that are closer to the camera than SD.


“Binned Depth Map” (BDM): a depth map that classifies the originally continuous depth map into a discrete depth map of several classes, each class covering a range of specific depths. Here, we use 2-class and 3-class BDMs.


“Frame Rate Mask” (FRM): a binary mask that includes all pixels that are to be played with a first frame rate (FR1), with the part outside of the mask are played with a second frame rate (FR2). Per definition, SD is played in FR1 while SD+ is played in FR2. In a general case, a plurality of FRMs with different frame rates, e.g. FR3, FR4 or FR5, may be provided. In this case, the FRM expands to a mask discriminating 3, 4 or 5 pixel groups.


“PFR1”: group of pixels played in FR1 (marked in white in the FRM presented e.g. in FIG. 3).


“PFR2”: group of pixels played in FR2 (marked in black in the FRM presented e.g. in FIG. 3).


“PFR3”: group of pixels played in FR3 (not shown in the figures herein).



FIG. 1 illustrates an example video output provided by a method disclosed herein. The figure shows nine consecutive frames 1-9 of a video stream, with a left column showing original frames (input data) and with a right column showing output (or generated) frames (output data). The video stream includes two objects, a first object 102 (a runner distanced farther from a viewer, i.e. in the “back”) and a second object 104 (a runner distanced closer to a viewer, i.e. in the “front”). For simplicity, numerals 102 and 104 are shown only in frames 1 and 9. In the original video, object 104 is running faster than object 102. In the shown output video, object 104 is selected to be played two times slower than in the original video. The outcome is that object 104 is now seen running slower than object 102.



FIG. 2 shows a general flow chart of an exemplary embodiment of a method disclosed herein. A video stream (sequence of N frames) 202 recorded at a certain user- or application-assigned frame rate FR is used as input. In step 204, the user or application marks an object of interest (e.g. object 104) or region of interest and a relative velocity of the OOI or the ROI. In general, the OOI or ROI is only marked in one of the frames, e.g. in the 1st of the N frames. The relative velocity (or “slow motion factor”) of the OOI or the ROI defines a frame rate ratio between the frame rate with which the OOI or the ROI is played, and the frame rate at which the foreground and/or background are played. In step 206, frames used for generating an output stream are selected. These are referred to henceforth as “selected frames”.


In a first example and with reference to FIG. 1, one wants to make object 104 (and optionally additional segments of the frames) move half as fast as in original video stream, corresponding to a relative velocity (slow motion factor) and a frame rate ratio of 2. At least two frames need to be selected in order to obtain information on the movement of OOI 104 and on the movement of foreground FG and background BR (i.e. all the pixels in the frame except object 104). If one wants to achieve the given effect with fewer than four frames, movement models predicting the inter-frame movement have to be deployed.


The selection of at least two frames may be made in various ways. One option is presented in Table 1,



















Table 1









OutIdx
1
2
3
4
5
6
7
8



ObjIdx
1
1
2
2
3
3
4
4



BGIdx
1
2
3
4
5
6
7
8











where ObjIdx axis the index of the input frame from which the OOI (i.e. object 104) is taken, BGIdx is the index of the input frame from which the background is taken, and OutIdx is the index of the respective output frame.


In step 208, the OOIs are detected in the at least two selected frames. In step 210, the algorithm calculates a segmentation mask for the OOI. In step 212, data missing (e.g. caused by occlusion) in the at least four selected frames is filled in from frames other than the selected frames (for example neighboring frames). In step 214, data and information generated in steps 204-212 is processed to generate a new frame. Newly generated frames are assembled into an output video stream 216.


In this example, one can write a general equation:








Obj
Idx

=

ceil


(


Out
Idx


S


M

f

a

c

t

o

r




)



,


B


G
Idx


=

Out
Idx


,





where ceil(x) returns the smallest integer that is greater than or equal to x (i.e. rounds up the nearest integer) and SMfactor is the slow-motion factor of the object (in this example SMfactor=2).


In a second example one wants to make object 104 move twice as fast as in original video stream. Again, at least two frames need to be selected. One option is presented in Table 2.



















Table 2









OutIdx
1
2
3
4
5
6
7
8



ObjIdx
1
2
3
4
5
6
7
8



BGIdx
1
1
2
2
3
3
4
4











In this example, the general equation is:








Obj
Idx

=

Out
Idx


,


B


G
Idx


=


ceil


(


Out
Idx


S


M

f

a

c

t

o

r




)


.






Given a video of RGB images, i.e. frames F={fi}i=1NFrames and a depth map overlaying each frame D={di}i=1NFrames, methods disclosed herein generate new videos in which the pixel groups PFM1 and PFM2 are played at different frame rate. The depth map can be obtained using for example stereo-camera triangulation, depth from motion, gated imaging, time of flight (TOF) cameras, coded aperture based cameras, a Laser Auto-focus unit (“Laser AF”), an image sensor with Phase Detection Auto Focus (“PDAF”) capability etc. In depth maps shown herein, the gray scale depicts the respective depth (white=zero distance from camera, black=infinite distance from camera). The depth maps or images discussed herein are assumed to be captured from a same POV or a similar POV as well as captured substantially simultaneously with the RGB images shown along with the depth maps.


For the sake of clarity the term “substantially” is used herein to imply the possibility of variations in values within an acceptable range. For example, “substantially simultaneously” may refer to the capture of frames for two video streams within ±5 ms, ±10 ms, ±20 ms or even ±30 ms. For example, “substantially simultaneously” may refer to the synchronization of frames from two video streams within ±5 ms, 10 ms, ±20 ms or even ±30 ms.


We distinguish two cases for the frame rate of segments of the image that are closer to the camera (i.e. SD):


Case A (Example 1): the OOI or ROI and image segments closer to the camera than the OOI or ROI (foreground FG) are played at FR1, while image segments farther from the camera (background BG) than the OOI or ROI are played with FR2. SD is played at the same FR as SD (i.e. FR1) and all the other depths are played at FR2. Thus PFR1=SD∪SD and PFR2=SD+. In this case, we do not need to indicate where the pixels of SD are, since they are played at the same FR as SD such that OOIs or ROIs at SD will never be occluded. Therefore, we obtain FRM=BDM.


Case B (Example 2) only the OOI or ROI is played with FR1, while both FG and BG are played with FR2. SD is played at the same FR as SD+ (i.e. FR2). Thus PFR1=SD, PFR2=SD+∪SD. Since SD and SD are played with different frame rate, some information will be missing in the newly generated frames because of occlusions.


In an additional, third example, different “depth slices” (parts of the image with of a certain corresponding depth range) for example, a first depth slice 1: 0.5-1 m, a second depth slice 2: 1-2 m, and a third depth slice 3: 2-4 m, are played with different FRs. For example, the RGB information of depth slice 1 is played with FR 1, the RGB information of depth slice 2 is played with FR 2, the RGB information of depth slice 3 is played with FR 3, etc. In some examples it may be FR1<FR2<FR3 etc., or vice versa. In other examples, there may not be such a FR order according to depth. This slicing principle may be used to, for example, highlight an OOI or ROI by leaving the OOI or ROI unmoved, and let the BG move faster the more far away it is from the OOI or RO. In some examples, artificial objects may be added to one or more of the depth slices. An artificial object may be an artificially created object such as an object drawn manually or by a computer. An artificial object may be image data not included in one of the images of the input video stream. In some examples, an artificial object may be image data from an image captured with another camera of a same host device.


In other examples, a physical property of entities (e.g. an OOI or ROI) other than depth may be used for defining object, FG and BG. A physical property may be spectral composition. In yet other examples, visual data such as texture of entities (e.g. an OOI or ROI) may be used for defining object, FG and BG.



FIG. 3 illustrates respective FRMs and BDMs of a specific image for cases A and B. Here, the OOI is a dancing girl 302. Corresponding to this image is a depth map of the same scene (not shown here). Here, a depth of the scene is assumed which increases constantly for larger Y values. A constantly increasing depth is e.g. shown in FIG. 4. In the given scene, a runner 304 is closer to the camera than girl 302. In case A, the FRM and BDM include both dancing girl 302 and runner 304 (as well as all other pixels with assigned depth smaller than that of girl 302). In case B, the FRM only includes girl 302, as well as pixel groups of the BG with assigned depth equal to the assigned depth of the girl 302. For case B, the BDM is differentiated into three pixel groups with different assigned depths: the depth of OOI 302 (SD, white), a depth larger than depth of OOI 302 (SD+, black), and a depth smaller than the depth of OOI 302 (SD, gray).



FIG. 4 presents an example of an image set containing a RGB image (left side), next to a depth map (center) covering the same scene from the same (or very similar) point of view (POV) as that of the RGB image, and next to SD map (right side) derived according to a method disclosed herein. The specific SD is chosen based on the RGB image and depth map data. A runner 402 is closest to the camera, a girl 404 is farther away from the camera, and a boy 406 is at the farthest distance from the camera. Here, girl 404 is defined as the OOI, leading to the presented specific SD.



FIG. 5A presents a block diagram of processor numbered 500 in a system disclosed herein and used for case A. The following notations are used: foreground (FG) and background (BG) respective frames fFGIdx and fBGIdx, corresponding respective masks mFGIdx and mBGIdx, generated respective images fBGIdx and fFGIdx and a composed new output frame fOutIdx.


Processor 500 may be for example an application processor of a smartphone or a tablet. In processor 500, the input frames of the RGB video stream 502 and the depth map video stream 504 constitute data inputs for the method disclosed here. Depending on a FR speed chosen by a human user (e.g. manually) or chosen by a dedicated algorithm (e.g. automatically), indices of the frames to be used for the output video stream are selected by a FG and BG index selector module 506. These indices are the input for a mask generator module 508 that performs step 210 in FIG. 2. Depending on objects or areas of interest in the RGB image (also chosen by the human user or by the dedicated algorithm), the frames with the indices selected in 506 are requested from a frame and depth selector module 508a. Masks defining the areas that are played with different FRs are calculated in a mask extractor module 508b for the foreground FR, and in a mask extractor module 508c for the background BR. From module 508c, information is fed into a hole filler module 512, where missing information (e.g. because of occlusion of the object or area of interest by another object) is replaced by information calculated from input frames of RGB video stream 502 and depth map video stream 504 other than the ones actually used for the output video stream. A new frame generator module 514 assembles the information and outputs the newly generated video stream.



FIG. 5B presents a block diagram of processor numbered 500′ in a system disclosed herein and used for case B. In addition to modules and functions of processor 500 in FIG. 5A, processor 500′ includes an additional selected depth object estimator module 516, in which the depth of the selected object or area is estimated in case the selected object or area is occluded by another object.


Because of the more complex FRM deployed in case B compared to case A, this information must be generated, e.g. by estimation from other frames of the depth map video stream (e.g. neighboring frames), e.g. by deploying a motion model. Module 512 that computes fBGIdx remains practically the same as in case A, except for mask mBGIdx that is passed to module 512. In contrast with case A, the mask now includes only the selected depth and not SD.



FIG. 5C shows an embodiment of a camera disclosed herein and numbered 520. Camera 520 includes camera elements such as optical components (i.e. a lens system) 522 and an image sensor 524. Camera 520 may be a multi-camera system that has more than one lens system and image sensor. Images and video streams recorded via lens system 522 and image sensor 524 may be processed in an application processor 526 that interacts with a memory 528. A human user can trigger actions in the camera via a human machine interface “HMI” (or simply “interface”) 532. Information that supports actions such as generation of artificial image data and information may be stored in a database 534. In various embodiments, one or more of the components application processor 526, memory 528, HMI 532 and database 534 may be included in the camera. In some embodiments (such as in FIG. 5D) application processor 526, memory 528, HMI 532 and database 534 may be external to the camera.



FIG. 5D shows an embodiment of a host device disclosed herein and numbered 540, for example a smartphone or tablet. Device 540 comprises a camera 542, application processor 526, memory 528, HMI 532 and database 534. In some embodiments, database 534 may be virtual, with information not located physically on the device, but located on an external server, e.g. on a cloud server. Device 540 may comprise a multi camera system, e.g. several cameras for capturing RGB images and one or more additional sensing cameras, e.g. a time of flight (TOF) camera sensing depth information of a scene.


In some examples, camera 542 may provide the video stream input for the method described herein. In other examples, the video stream input may be supplied from outside a host device, e.g. via a cloud server.



FIGS. 6A-6C depict the generation of FRMs for the cases A and B outlined below. FIG. 6A shows two input video streams of the same scene as in FIG. 4, one input stream (left) being of RGB images (also referred to as “RGB image stream”), the other input stream (right) being of depth images (also referred to as “depth image stream”). As in FIG. 4, the images include runner 402, girl (OOI) 404 and boy 406. FIG. 6B shows the FRM generated for the input video streams of FIG. 6A for case A. FIG. 6C shows the FRM generated for the input video streams of FIG. 6A for case B. In input frame 4, we find that girl 404 is partly occluded by runner 402.


In FIG. 6B, the FRM includes the selected depth SD and all the depths closer to the camera SD. In this case, the mask that needs to be extracted from the depth image is a binary mask that indicates where SD and SD are located in the RGB image. In the binary mask, “1” (white) represents the regions of SD and SD and “0” (black) represents all other depths. Foreground 408 and background 412 refers to segments of the image that have an assigned depth that is smaller and larger than the selected depth respectively.


The following describes a general method to provide effects like those in the first and second examples above in more detail. In step 206, FIG. 2, two frames are extracted from the input video streams. An output frame will be composed of these two frames, one frame being used for forming the background fBGIdx and the other frame being used for forming the foreground fFGIdx. BGIdx and FGIdx are indices that indicate which frame from the input frames, F, are selected, thus, BGIdx, FGIdx∈[1, 2 . . . NFrames].


Once the indices from the input frames are chosen, the selected depth masks for the images need to be extracted.


The next step after the extraction of BGIdx and FGIdx is to select the BG and FG frames fFGIdx and fBGIdx together with their corresponding masks mFGIdx and mBGIdx and to generate the two image frames fBGIdx and fFGIdx that will be combined (“stitched”) together to compose the new output frame fOutIdx. Since the regions of selected depths are never occluded, fBGIdx can be obtained directly from the input frame and the corresponding mask. Therefore, fFGIdx=fFGIdx·mFGIdx.


To obtain fBGIdx, we need to delete the region in the image where mBGIdx indicates the selected depth, and fill this region with the background. To delete the region with the selected depth, we can for example use f′BGIdx mBGIdx·fBGIdx. To fill the missing information in the background, we can use methods such in-painting (see e.g. Bertalmio, Marcelo, Andrea L. Bertozzi, and Guillermo Sapiro. “Navier-Stokes, fluid dynamics and image and video inpainting.” Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on. Vol. 1. IEEE, 2001) or utilize information from consecutive frames (see e.g. Jia, Yun-Tao, Shi-Min Hu, and Ralph R. Martin. “Video completion using tracking and fragment merging.” The Visual Computer 21.8-10 (2005): 601-610). The indices of the input frames which will be used to fill the holes in f′BGIdx are: [BGIdx−k, BGIdx+k], where k is a parameter that indicates the number of consecutive frames from each side of fBGIdx. In general, k does not have to be constant, and can be different from frame to frame, in which case it will be marked as kOutIdx.


Once we have fBGIdx and fFGIdx, they can be stitched together using mFGIdx and methods described for example in Burt, Peter J., and Edward H. Adelson. “A multiresolution spline with application to image mosaics.” ACM Transactions on Graphics (TOG) 2.4 (1983): 217-236.


In case A, we used the depth map to detect the selected depth and all the depths closer to the camera, which were to be played at the same FR. This causes objects in the RGB image with corresponding selected depth to have all the information needed to generate the new frame in the output video in each frame.


In case B, we use the depth map in order to detect the regions of selected depth, which are to be played at the same FR. All regions with other corresponding depths are to be played at a different FR. Here, in general, the object in the selected depth will not contain all the information needed to compose the new frame (see e.g. input frame 4 in FIG. 6A), and there is a need to generate this information, e.g. by algorithms generating artificial input based on prior “experience”, or from other frames, e.g. from subsequent consecutive frames (e.g. by using a motion model). In this case, it is possible that an object that is closer to the camera than the selected depth will occlude parts of the objects in the selected depth, so that the FG frame and the corresponding mask will have holes where data is missing.



FIG. 6C shows the FRM generated for the input video streams of FIG. 6A for case B. In input frame 4, we find that girl 404 is partly occluded by runner 402.


The selection of the frame indexes from the input remains the same as in case A. The mask extracted from the depth image for the selected depth−mFGIdx does not contain all the information for the objects in the selected depth and therefore a new mask mFGIdx needs to be defined. This mask is not extracted from the depth image, but estimated, e.g. by using information from other frames.


The information of the object in the selected depth that exists in fFGIdx will be referred to as f′FGIdx=fFGIdx·mFGIdx. The frame with full object information within the selected depth derived e.g. based on information from consecutive frames) is given by fFGIdx.



FIG. 7 presents RGB images (first column), corresponding depth maps (second column) and the selected depth mask (third column for case A, fourth column for case B) from a frame related to the scene in FIGS. 6A-C that does not contain all information of the object in the selected depth. The situation of missing data in case B as described above is illustrated in the second row of FIG. 7, where the information on the mask is missing because of occlusion of an object caused by another object.


In case A, the mask is a binary mask. In case B, the mask is a mask with three values: 0 (black) 0.5 (gray) and 1 (white).



FIG. 8A presents the same RGB images as shown in FIG. 7 having a same corresponding depth map (not shown) that is shown in FIG. 7 (first column) and the information reconstruction process for the depth map part (second to fifth column) for case B, both for the case of complete depth map information (row 1), and for the case of incomplete depth map information because of occlusion (row 2). In the second row and fourth column, selected depth masks are presented that partly need to be generated, e.g. by estimations based on information of other frames: the gray parts in the mask are parts that are to be generated. In the fifth column, the selected depth mask with generated information is shown. This depth mask can further on be used for the new output frame composition.



FIG. 8B shows, along with the selected depth masks (second and fourth column), the respective masked RGB image segments (third column) and the RGB output frame of the computational step which fills missing data from neighboring frames (step 212 in FIG. 2) in the last (fifth) column.


While this disclosure describes a limited number of embodiments, it will be appreciated that many variations, modifications and other applications of such embodiments may be made. In general, the disclosure is to be understood as not limited by the specific embodiments described herein, but only by the scope of the appended claims.


It will also be understood that the presently disclosed subject matter further contemplates a suitably programmed computer for executing the operation as disclosed herein above. Likewise, the presently disclosed subject matter contemplates a computer program being readable by a computer for executing the method as disclosed herein. The presently disclosed subject matter further contemplates a non-transitory computer-readable memory tangibly embodying a program of instructions executable by the computer for executing the method as disclosed herein.


All references mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual reference was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present application.

Claims
  • 1. A system, comprising: a digital camera configured to record an input video stream at an assigned frame rate;an interface operated to mark a first entity in a frame of the input video stream and a slow motion factor of the first entity, and to determine based on the slow motion factor a first frame rate FR1 for playing of the first entity in an output video stream and a second frame rate FR2 different from FR1, for playing of at least one second entity in the output video stream, wherein at least one of FR1 or FR2 is different from the assigned frame rate; anda processor configured to generate the output video stream based on the input video stream of the digital camera, the marked first entity, and the determined FR1 and FR2, wherein the output video stream includes the first entity played at FR1 and the at least one second entity played at FR2.
  • 2. The system of claim 1, wherein the first entity is an object of interest (OOI) or region of interest (ROI) and wherein the at least one second entity is selected from the group consisting of another object, an image foreground, an image background and a combination thereof.
  • 3. The system of claim 2, wherein the interface is operated by a human user.
  • 4. The system of claim 3, wherein the OOI or the ROI is identified in at least one single frame of the input video stream with an object classification or an object segmentation algorithm.
  • 5. The system of claim 4, wherein the OOI or ROI is tracked at least through a part of the input video stream with a tracking algorithm.
  • 6. The system of claim 2, wherein the interface is operated by an application or by an algorithm.
  • 7. The system of claim 1, wherein the output video stream includes at least one added entity played at a frame rate different from FR1 and FR2.
  • 8. The system of claim 1, wherein the given input stream includes at least one given entity played at a frame rate different from FR1 and FR2.
  • 9. The system of claim 1, wherein the processor is further configured to use a depth map stream that is spatially and temporally aligned with the input video stream to generate the output video stream.
  • 10. The system of claim 9, wherein the depth map is used to determine a depth of each entity.
  • 11. The system of claim 9, wherein the depth map is a discrete depth map of several classes, each class covering a range of specific depths.
  • 12. The system of claim 11, wherein an entity is played with a frame rate that depends on the class covering a range of specific depths.
  • 13. The system of claim 9, wherein the depth map is generated using image data of a Time-of-Flight camera.
  • 14. The system of claim 9, wherein the depth map is generated using image data of a stereo camera.
  • 15. The system of claim 9, wherein the depth map is generated using a laser autofocus unit.
  • 16. The system of claim 9, wherein the depth map is generated using Phase Detection Auto Focus.
  • 17. A method, comprising: by a processor configured to obtain an input video stream recorded at an assigned frame rate and to output an output video stream, marking a first entity in a frame of the input video stream;marking a slow motion factor of the first entity;determining based on the slow motion factor a first frame rate FR1 for playing of the first entity in the output video stream and a second frame rate FR2 different from FR1, for playing of at least one second entity in the output video stream, wherein at least one of FR1 or FR2 is different from the assigned frame rate; andgenerating the output video stream, wherein the output video stream includes the first entity played at FR1 and the second entity played at FR2.
  • 18. The method of claim 17, further comprising using a depth map to determine a depth of each entity.
  • 19. The method of claim 17, wherein the given input stream includes at least one given entity played at a frame rate that is different from FR1 and FR2.
  • 20. The method of claim 17, further comprising using a depth map stream that is spatially and temporally aligned with the input video stream to generate the output video stream.
  • 21. The method of claim 20, wherein the depth map is generated by using image data of a Time-of-Flight camera.
  • 22. The method of claim 20, wherein the depth map is generated by using image data of a stereo camera.
  • 23. The method of claim 20, wherein the depth map is generated by using a Laser Autofocus unit.
  • 24. The method of claim 20, wherein the depth map is generated by using Phase Detection Auto Focus.
CROSS REFERENCE TO RELATED APPLICATIONS

This application is related to and claims priority from US Provisional Patent Application No. 62/928,014 filed Oct. 30, 2019, which is incorporated herein by reference in its entirety.

US Referenced Citations (290)
Number Name Date Kind
4199785 McCullough et al. Apr 1980 A
5005083 Grage et al. Apr 1991 A
5032917 Aschwanden Jul 1991 A
5041852 Misawa et al. Aug 1991 A
5051830 von Hoessle Sep 1991 A
5099263 Matsumoto et al. Mar 1992 A
5248971 Mandi Sep 1993 A
5287093 Amano et al. Feb 1994 A
5394520 Hall Feb 1995 A
5436660 Sakamoto Jul 1995 A
5444478 Lelong et al. Aug 1995 A
5459520 Sasaki Oct 1995 A
5657402 Bender et al. Aug 1997 A
5682198 Katayama et al. Oct 1997 A
5768443 Michael et al. Jun 1998 A
5926190 Turkowski et al. Jul 1999 A
5940641 McIntyre et al. Aug 1999 A
5982951 Katayama et al. Nov 1999 A
6101334 Fantone Aug 2000 A
6128416 Oura Oct 2000 A
6148120 Sussman Nov 2000 A
6208765 Bergen Mar 2001 B1
6268611 Pettersson et al. Jul 2001 B1
6549215 Jouppi Apr 2003 B2
6611289 Yu et al. Aug 2003 B1
6643416 Daniels et al. Nov 2003 B1
6650368 Doron Nov 2003 B1
6680748 Monti Jan 2004 B1
6714665 Hanna et al. Mar 2004 B1
6724421 Glatt Apr 2004 B1
6738073 Park et al. May 2004 B2
6741250 Furlan et al. May 2004 B1
6750903 Miyatake et al. Jun 2004 B1
6778207 Lee et al. Aug 2004 B1
7002583 Rabb, III Feb 2006 B2
7015954 Foote et al. Mar 2006 B1
7038716 Klein et al. May 2006 B2
7199348 Olsen et al. Apr 2007 B2
7206136 Labaziewicz et al. Apr 2007 B2
7248294 Slatter Jul 2007 B2
7256944 Labaziewicz et al. Aug 2007 B2
7305180 Labaziewicz et al. Dec 2007 B2
7339621 Fortier Mar 2008 B2
7346217 Gold, Jr. Mar 2008 B1
7365793 Cheatle et al. Apr 2008 B2
7411610 Doyle Aug 2008 B2
7424218 Baudisch et al. Sep 2008 B2
7509041 Hosono Mar 2009 B2
7533819 Barkan et al. May 2009 B2
7619683 Davis Nov 2009 B2
7738016 Toyofuku Jun 2010 B2
7773121 Huntsberger et al. Aug 2010 B1
7809256 Kuroda et al. Oct 2010 B2
7880776 LeGall et al. Feb 2011 B2
7918398 Li et al. Apr 2011 B2
7964835 Olsen et al. Jun 2011 B2
7978239 Deever et al. Jul 2011 B2
8115825 Culbert et al. Feb 2012 B2
8149327 Lin et al. Apr 2012 B2
8154610 Jo et al. Apr 2012 B2
8238695 Davey et al. Aug 2012 B1
8274552 Dahi et al. Sep 2012 B2
8390729 Long Mar 2013 B2
8391697 Cho et al. Mar 2013 B2
8400555 Georgiev et al. Mar 2013 B1
8439265 Ferren et al. May 2013 B2
8446484 Muukki et al. May 2013 B2
8483452 Ueda et al. Jul 2013 B2
8514491 Duparre Aug 2013 B2
8547389 Hoppe et al. Oct 2013 B2
8553106 Scarff Oct 2013 B2
8587691 Takane Nov 2013 B2
8619148 Watts et al. Dec 2013 B1
8803990 Smith Aug 2014 B2
8896655 Mauchly et al. Nov 2014 B2
8976255 Matsuoto et al. Mar 2015 B2
9019387 Nakano Apr 2015 B2
9025073 Attar et al. May 2015 B2
9025077 Attar et al. May 2015 B2
9041835 Honda May 2015 B2
9137447 Shibuno Sep 2015 B2
9185291 Shabtay et al. Nov 2015 B1
9215377 Sokeila et al. Dec 2015 B2
9215385 Luo Dec 2015 B2
9270875 Brisedoux et al. Feb 2016 B2
9286680 Jiang et al. Mar 2016 B1
9344626 Silverstein et al. May 2016 B2
9360671 Zhou Jun 2016 B1
9369621 Malone et al. Jun 2016 B2
9413930 Geerds Aug 2016 B2
9413984 Attar et al. Aug 2016 B2
9420180 Jin Aug 2016 B2
9438792 Nakada et al. Sep 2016 B2
9485432 Medasani et al. Nov 2016 B1
9578257 Attar et al. Feb 2017 B2
9618748 Munger et al. Apr 2017 B2
9681057 Attar et al. Jun 2017 B2
9723220 Sugie Aug 2017 B2
9736365 Laroia Aug 2017 B2
9736391 Du et al. Aug 2017 B2
9768310 Ahn et al. Sep 2017 B2
9800798 Ravirala et al. Oct 2017 B2
9851803 Fisher et al. Dec 2017 B2
9894287 Qian et al. Feb 2018 B2
9900522 Lu Feb 2018 B2
9927600 Goldenberg et al. Mar 2018 B2
20020005902 Yuen Jan 2002 A1
20020030163 Zhang Mar 2002 A1
20020063711 Park et al. May 2002 A1
20020075258 Park et al. Jun 2002 A1
20020122113 Foote Sep 2002 A1
20020167741 Koiwai et al. Nov 2002 A1
20030030729 Prentice et al. Feb 2003 A1
20030093805 Gin May 2003 A1
20030160886 Misawa et al. Aug 2003 A1
20030202113 Yoshikawa Oct 2003 A1
20040008773 Itokawa Jan 2004 A1
20040012683 Yamasaki et al. Jan 2004 A1
20040017386 Liu et al. Jan 2004 A1
20040027367 Pilu Feb 2004 A1
20040061788 Bateman Apr 2004 A1
20040141065 Hara et al. Jul 2004 A1
20040141086 Mihara Jul 2004 A1
20040240052 Minefuji et al. Dec 2004 A1
20050013509 Samadani Jan 2005 A1
20050046740 Davis Mar 2005 A1
20050157184 Nakanishi et al. Jul 2005 A1
20050168834 Matsumoto et al. Aug 2005 A1
20050185049 Iwai et al. Aug 2005 A1
20050200718 Lee Sep 2005 A1
20060054782 Olsen et al. Mar 2006 A1
20060056056 Ahiska et al. Mar 2006 A1
20060067672 Washisu et al. Mar 2006 A1
20060102907 Lee et al. May 2006 A1
20060125937 LeGall et al. Jun 2006 A1
20060170793 Pasquarette et al. Aug 2006 A1
20060175549 Miller et al. Aug 2006 A1
20060187310 Janson et al. Aug 2006 A1
20060187322 Janson et al. Aug 2006 A1
20060187338 May et al. Aug 2006 A1
20060227236 Pak Oct 2006 A1
20070024737 Nakamura et al. Feb 2007 A1
20070126911 Nanjo Jun 2007 A1
20070177025 Kopet et al. Aug 2007 A1
20070188653 Pollock et al. Aug 2007 A1
20070189386 Imagawa Aug 2007 A1
20070257184 Olsen et al. Nov 2007 A1
20070285550 Son Dec 2007 A1
20080017557 Witdouck Jan 2008 A1
20080024614 Li et al. Jan 2008 A1
20080025634 Border et al. Jan 2008 A1
20080030592 Border et al. Feb 2008 A1
20080030611 Jenkins Feb 2008 A1
20080084484 Ochi et al. Apr 2008 A1
20080106629 Kurtz et al. May 2008 A1
20080117316 Orimoto May 2008 A1
20080129831 Cho et al. Jun 2008 A1
20080218611 Parulski et al. Sep 2008 A1
20080218612 Border et al. Sep 2008 A1
20080218613 Janson et al. Sep 2008 A1
20080219654 Border et al. Sep 2008 A1
20090086074 Li et al. Apr 2009 A1
20090109556 Shimizu et al. Apr 2009 A1
20090122195 Van Baar et al. May 2009 A1
20090122406 Rouvinen et al. May 2009 A1
20090128644 Camp et al. May 2009 A1
20090180761 Wand Jul 2009 A1
20090219547 Kauhanen et al. Sep 2009 A1
20090252484 Hasuda et al. Oct 2009 A1
20090295949 Ojala Dec 2009 A1
20090324135 Kondo et al. Dec 2009 A1
20100013906 Border et al. Jan 2010 A1
20100020221 Tupman et al. Jan 2010 A1
20100060746 Olsen et al. Mar 2010 A9
20100097444 Lablans Apr 2010 A1
20100103194 Chen et al. Apr 2010 A1
20100165131 Makimoto et al. Jul 2010 A1
20100196001 Ryynänen et al. Aug 2010 A1
20100238327 Griffith et al. Sep 2010 A1
20100259836 Kang et al. Oct 2010 A1
20100283842 Guissin et al. Nov 2010 A1
20100321494 Peterson et al. Dec 2010 A1
20110058320 Kim et al. Mar 2011 A1
20110063417 Peters et al. Mar 2011 A1
20110063446 McMordie et al. Mar 2011 A1
20110064327 Dagher et al. Mar 2011 A1
20110080487 Venkataraman et al. Apr 2011 A1
20110128288 Petrou et al. Jun 2011 A1
20110164172 Shintani et al. Jul 2011 A1
20110229054 Weston et al. Sep 2011 A1
20110234798 Chou Sep 2011 A1
20110234853 Hayashi et al. Sep 2011 A1
20110234881 Wakabayashi et al. Sep 2011 A1
20110242286 Pace et al. Oct 2011 A1
20110242355 Goma et al. Oct 2011 A1
20110298966 Kirschstein et al. Dec 2011 A1
20120026366 Golan et al. Feb 2012 A1
20120044372 Cote et al. Feb 2012 A1
20120062780 Morihisa Mar 2012 A1
20120069235 Imai Mar 2012 A1
20120075489 Nishihara Mar 2012 A1
20120105579 Jeon et al. May 2012 A1
20120124525 Kang May 2012 A1
20120154547 Aizawa Jun 2012 A1
20120154614 Moriya et al. Jun 2012 A1
20120196648 Havens et al. Aug 2012 A1
20120229663 Nelson et al. Sep 2012 A1
20120249815 Bohn et al. Oct 2012 A1
20120287315 Huang et al. Nov 2012 A1
20120320467 Baik et al. Dec 2012 A1
20130002928 Imai Jan 2013 A1
20130016427 Sugawara Jan 2013 A1
20130063629 Webster et al. Mar 2013 A1
20130076922 Shihoh et al. Mar 2013 A1
20130093842 Yahata Apr 2013 A1
20130094126 Rappoport et al. Apr 2013 A1
20130113894 Mirlay May 2013 A1
20130135445 Dahi et al. May 2013 A1
20130155176 Paripally et al. Jun 2013 A1
20130182150 Asakura Jul 2013 A1
20130201360 Song Aug 2013 A1
20130202273 Ouedraogo et al. Aug 2013 A1
20130235224 Park et al. Sep 2013 A1
20130250150 Malone et al. Sep 2013 A1
20130258044 Betts-Lacroix Oct 2013 A1
20130270419 Singh et al. Oct 2013 A1
20130278785 Nomura et al. Oct 2013 A1
20130321668 Kamath Dec 2013 A1
20140009631 Topliss Jan 2014 A1
20140049615 Uwagawa Feb 2014 A1
20140118584 Lee et al. May 2014 A1
20140192238 Attar et al. Jul 2014 A1
20140192253 Laroia Jul 2014 A1
20140218587 Shah Aug 2014 A1
20140313316 Olsson et al. Oct 2014 A1
20140362242 Takizawa Dec 2014 A1
20150002683 Hu et al. Jan 2015 A1
20150042870 Chan et al. Feb 2015 A1
20150070781 Cheng et al. Mar 2015 A1
20150092066 Geiss et al. Apr 2015 A1
20150103147 Ho et al. Apr 2015 A1
20150138381 Ahn May 2015 A1
20150154776 Zhang et al. Jun 2015 A1
20150162048 Hirata et al. Jun 2015 A1
20150195458 Nakayama et al. Jul 2015 A1
20150215516 Dolgin Jul 2015 A1
20150237280 Choi et al. Aug 2015 A1
20150242994 Shen Aug 2015 A1
20150244906 Wu et al. Aug 2015 A1
20150253543 Mercado Sep 2015 A1
20150253647 Mercado Sep 2015 A1
20150261299 Wajs Sep 2015 A1
20150271471 Hsieh et al. Sep 2015 A1
20150281678 Park et al. Oct 2015 A1
20150286033 Osborne Oct 2015 A1
20150316744 Chen Nov 2015 A1
20150334309 Peng et al. Nov 2015 A1
20160044250 Shabtay et al. Feb 2016 A1
20160070088 Koguchi Mar 2016 A1
20160154202 Wippermann et al. Jun 2016 A1
20160154204 Lim et al. Jun 2016 A1
20160212358 Shikata Jul 2016 A1
20160212418 Demirdjian et al. Jul 2016 A1
20160241751 Park Aug 2016 A1
20160291295 Shabtay et al. Oct 2016 A1
20160295112 Georgiev et al. Oct 2016 A1
20160301840 Du et al. Oct 2016 A1
20160353008 Osborne Dec 2016 A1
20160353012 Kao et al. Dec 2016 A1
20170019616 Zhu et al. Jan 2017 A1
20170070731 Darling et al. Mar 2017 A1
20170187962 Lee et al. Jun 2017 A1
20170214846 Du et al. Jul 2017 A1
20170214866 Zhu et al. Jul 2017 A1
20170242225 Fiske Aug 2017 A1
20170289458 Song et al. Oct 2017 A1
20180013944 Evans, V et al. Jan 2018 A1
20180017844 Yu et al. Jan 2018 A1
20180024329 Goldenberg et al. Jan 2018 A1
20180059379 Chou Mar 2018 A1
20180120674 Avivi et al. May 2018 A1
20180150973 Tang et al. May 2018 A1
20180176426 Wei et al. Jun 2018 A1
20180198897 Fang et al. Jul 2018 A1
20180241922 Baldwin et al. Aug 2018 A1
20180295292 Lee et al. Oct 2018 A1
20180300901 Wakai et al. Oct 2018 A1
20190121103 Bachar et al. Apr 2019 A1
20190215438 Lee Jul 2019 A1
20190265875 Park Aug 2019 A1
Foreign Referenced Citations (39)
Number Date Country
101276415 Oct 2008 CN
201514511 Jun 2010 CN
102739949 Oct 2012 CN
103024272 Apr 2013 CN
103841404 Jun 2014 CN
1536633 Jun 2005 EP
1780567 May 2007 EP
2523450 Nov 2012 EP
S59191146 Oct 1984 JP
04211230 Aug 1992 JP
H07318864 Dec 1995 JP
08271976 Oct 1996 JP
2002010276 Jan 2002 JP
2003298920 Oct 2003 JP
2004133054 Apr 2004 JP
2004245982 Sep 2004 JP
2005099265 Apr 2005 JP
2006238325 Sep 2006 JP
2007228006 Sep 2007 JP
2007306282 Nov 2007 JP
2008076485 Apr 2008 JP
2010204341 Sep 2010 JP
2011085666 Apr 2011 JP
2013106289 May 2013 JP
20070005946 Jan 2007 KR
20090058229 Jun 2009 KR
20100008936 Jan 2010 KR
20140014787 Feb 2014 KR
101477178 Dec 2014 KR
20140144126 Dec 2014 KR
20150118012 Oct 2015 KR
2000027131 May 2000 WO
2004084542 Sep 2004 WO
2006008805 Jan 2006 WO
2010122841 Oct 2010 WO
2014072818 May 2014 WO
2017025822 Feb 2017 WO
2017037688 Mar 2017 WO
2018130898 Jul 2018 WO
Non-Patent Literature Citations (17)
Entry
Statistical Modeling and Performance Characterization of a Real-Time Dual Camera Surveillance System, Greienhagen et al., Publisher: IEEE, 2000, 8 pages.
A 3MPixel Multi-Aperture Image Sensor with 0.7μm Pixels in 0.11μm CMOS, Fife et al., Stanford University, 2008, 3 pages.
Dual camera intelligent sensor for high definition 360 degrees surveillance, Scotti et al., Publisher: IET, May 9, 2000, 8 pages.
Dual-sensor foveated imaging system, Hua et al., Publisher: Optical Society of America, Jan. 14, 2008, 11 pages.
Defocus Video Matting, McGuire et al., Publisher: ACM SIGGRAPH, Jul. 31, 2005, 11 pages.
Compact multi-aperture imaging with high angular resolution, Santacana et al., Publisher: Optical Society of America, 2015, 10 pages.
Multi-Aperture Photography, Green et al., Publisher: Mitsubishi Electric Research Laboratories, Inc., Jul. 2007, 10 pages.
Multispectral Bilateral Video Fusion, Bennett et al., Publisher: IEEE, May 2007, 10 pages.
Super-resolution imaging using a camera array, Santacana et al., Publisher: Optical Society of America, 2014, 6 pages.
Optical Splitting Trees for High-Precision Monocular Imaging, McGuire et al., Publisher: IEEE, 2007, 11 pages.
High Performance Imaging Using Large Camera Arrays, Wilburn et al., Publisher: Association for Computing Machinery, Inc., 2005, 12 pages.
Real-time Edge-Aware Image Processing with the Bilateral Grid, Chen et al., Publisher: ACM SIGGRAPH, 2007, 9 pages.
Superimposed multi-resolution imaging, Carles et al., Publisher: Optical Society of America, 2017, 13 pages.
Viewfinder Alignment, Adams et al., Publisher: EUROGRAPHICS, 2008, 10 pages.
Dual-Camera System for Multi-Level Activity Recognition, Bodor et al., Publisher: IEEE, Oct. 2014, 6 pages.
Engineered to the task: Why camera-phone cameras are different, Giles Humpston, Publisher: Solid State Technology, Jun. 2009, 3 pages.
Office Action in related EP patent application No. 19845570.1, dated Jun. 9, 2020. 10 pages.
Related Publications (1)
Number Date Country
20210133475 A1 May 2021 US
Provisional Applications (1)
Number Date Country
62928014 Oct 2019 US