Slow or fast motion video using depth information

Description

FIELD

Embodiments disclosed herein relate in general to video generation and processing.

BACKGROUND

In known art, a recorded video stream is played with a sequentially constant frame rate (FR), with the option for the user to change the frame rate for all or some sequences of frames and to make these sequences appear in slow motion or time lapse. The slow motion or time lapse video streams are generated by a sequence of input frames that are played with a modified FR with respect to the FR used to capture the scene.

In highly professional setups such as the movie industry, there is an additional method, where the FR is controlled and modified only for some specific spatial information of the input frames. This is done mainly to highlight specific persons, objects or scenes, by playing the areas to be highlighted with a different frame rate than the rest of the frame.

For visual effects and improved user experience, it would be beneficial to have a system and method that generates the playing of areas to be highlighted with a different frame rate than the rest of the frame in an automated manner and under existing processing power constraints in devices such as smartphones or tablets.

SUMMARY

In various embodiments there are provided systems, comprising a digital camera, an interface operable to mark a first entity in a frame of an input video stream and to determine a frame rate ratio FR1/FR2 between a first frame rate FR1 and a second frame rate FR2, and a processor configurable to generate an output video stream of the digital camera, wherein the output video stream includes a first entity played at FR1 and at least one second entity played at FR2.

In an exemplary embodiment, the first entity is an object of interest (OOI) or region of interest (ROI) and the at least one second entity is selected from the group consisting of another object, an image foreground, an image background and a combination thereof.

In an exemplary embodiment, the output video stream includes at least one added entity played at a frame rate that is different from the first FR and the second FR.

In an exemplary embodiment, the given input stream includes at least one given entity played at a frame rate that is different from the first FR and the second FR.

In an exemplary embodiment, the interface is operable by a human user.

In an exemplary embodiment, the interface is operable by an application or by an algorithm.

In an exemplary embodiment, the OOI or the ROI is identified in at least a single frame of the input video stream with an object classification or an object segmentation algorithm.

In an exemplary embodiment, the OOI or ROI is tracked at least through a part of input video stream with a tracking algorithm.

In an exemplary embodiment, the processor is further configured to use a depth map stream that is spatially and temporally aligned with the input video stream to generate the output video stream.

In an exemplary embodiment, the depth map is used to determine a depth of each entity.

In an exemplary embodiment there is provided a method, comprising: in a digital camera configured to obtain an input video stream and to output an output video stream, marking a first entity in a frame of the input video stream, determining a frame rate ratio FR1/FR2 between a first frame rate FR1 and a second frame rate FR2, and generating the output video stream, wherein the output video stream includes a first entity played at FR1 and a second entity played at FR2.

In an exemplary embodiment, the method further comprises using a depth map to determine a depth of each entity.

BRIEF DESCRIPTION OF THE DRAWINGS

Non-limiting examples of embodiments disclosed herein are described below with reference to figures attached hereto that are listed following this paragraph. The drawings and descriptions are meant to illuminate and clarify embodiments disclosed herein, and should not be considered limiting in any way:

FIG. 1 illustrates an example video output provided by a method disclosed herein;

FIG. 2 shows a general flow chart of an exemplary embodiment of a method disclosed herein;

FIG. 3 illustrates respective frame rate masks and binned depth maps of a specific image for two different cases;

FIG. 4 presents an example of an image set of a scene containing a RGB image (left side), a depth map (center) and a derived SD (right side);

FIG. 5A shows a block diagram of an exemplary system used to run a method disclosed herein in a first example;

FIG. 5B shows a block diagram of an exemplary system used to run a method disclosed herein in a second example;

FIG. 5C shows an embodiment of a camera disclosed herein;

FIG. 5D shows an embodiment of a host device disclosed herein;

FIG. 6A shows a video RGB input stream and a video depth image input stream of the same scene;

FIG. 6B shows the FRM generated for the input video streams of FIG. 6A for case A;

FIG. 6C shows the FRM generated for the input video streams of FIG. 6A for case B;

FIG. 7 presents RGB images, depth maps and selected depth masks from frames related to the scene in FIGS. 6A-C that does and does not contain all object information;

FIG. 8A presents RGB images, depth masks and the depth information reconstruction process of case B for complete (first row) and incomplete (second row) depth mask information;

FIG. 8B shows case B selected depth masks and RGB image segments derived with these masks for complete (first row) and incomplete (second row) image information.

DETAILED DESCRIPTION
Definitions

“Entity”: a section or part of a RGB frame with information different from other sections or parts of the frame. Examples of such an entity are objects of interest (OOIs) or regions of interest (ROIs), as well as their respective foreground and background. The objects or regions of interests can be selected manually by the user or automatically by a dedicated algorithm.

“Assigned depth”: depth information on single pixels or segments of a RGB image which is obtained from a depth map that covers the same scene from the same (or similar) point of view (POV) as the RGB image.

“Selected Depth” (SD): depth of one or more selected objects in the RGB image.

“SD⁺”: depths that are further away from the camera than SD.

“SD⁻”: depths that are closer to the camera than SD.

“Binned Depth Map” (BDM): a depth map that classifies the originally continuous depth map into a discrete depth map of several classes, each class covering a range of specific depths. Here, we use 2-class and 3-class BDMs.

“Frame Rate Mask” (FRM): a binary mask that includes all pixels that are to be played with a first frame rate (FR1), with the part outside of the mask are played with a second frame rate (FR2). Per definition, SD is played in FR1 while SD⁺ is played in FR2. In a general case, a plurality of FRMs with different frame rates, e.g. FR3, FR4 or FR5, may be provided. In this case, the FRM expands to a mask discriminating 3, 4 or 5 pixel groups.

“P_FR1”: group of pixels played in FR1 (marked in white in the FRM presented e.g. in FIG. 3).

“P_FR2”: group of pixels played in FR2 (marked in black in the FRM presented e.g. in FIG. 3).

“P_FR3”: group of pixels played in FR3 (not shown in the figures herein).

FIG. 1 illustrates an example video output provided by a method disclosed herein. The figure shows nine consecutive frames 1-9 of a video stream, with a left column showing original frames (input data) and with a right column showing output (or generated) frames (output data). The video stream includes two objects, a first object 102 (a runner distanced farther from a viewer, i.e. in the “back”) and a second object 104 (a runner distanced closer to a viewer, i.e. in the “front”). For simplicity, numerals 102 and 104 are shown only in frames 1 and 9. In the original video, object 104 is running faster than object 102. In the shown output video, object 104 is selected to be played two times slower than in the original video. The outcome is that object 104 is now seen running slower than object 102.

FIG. 2 shows a general flow chart of an exemplary embodiment of a method disclosed herein. A video stream (sequence of N frames) 202 recorded at a certain user- or application-assigned frame rate FR is used as input. In step 204, the user or application marks an object of interest (e.g. object 104) or region of interest and a relative velocity of the OOI or the ROI. In general, the OOI or ROI is only marked in one of the frames, e.g. in the 1^stof the N frames. The relative velocity (or “slow motion factor”) of the OOI or the ROI defines a frame rate ratio between the frame rate with which the OOI or the ROI is played, and the frame rate at which the foreground and/or background are played. In step 206, frames used for generating an output stream are selected. These are referred to henceforth as “selected frames”.

In a first example and with reference to FIG. 1, one wants to make object 104 (and optionally additional segments of the frames) move half as fast as in original video stream, corresponding to a relative velocity (slow motion factor) and a frame rate ratio of 2. At least two frames need to be selected in order to obtain information on the movement of OOI 104 and on the movement of foreground FG and background BR (i.e. all the pixels in the frame except object 104). If one wants to achieve the given effect with fewer than four frames, movement models predicting the inter-frame movement have to be deployed.

The selection of at least two frames may be made in various ways. One option is presented in Table 1,

Table 1

Out_Idx
1
2
3
4
5
6
7
8

Obj_Idx
1
1
2
2
3
3
4
4

BG_Idx
1
2
3
4
5
6
7
8

where Obj_Idxaxis the index of the input frame from which the OOI (i.e. object 104) is taken, BG_Idxis the index of the input frame from which the background is taken, and Out_Idxis the index of the respective output frame.

In step 208, the OOIs are detected in the at least two selected frames. In step 210, the algorithm calculates a segmentation mask for the OOI. In step 212, data missing (e.g. caused by occlusion) in the at least four selected frames is filled in from frames other than the selected frames (for example neighboring frames). In step 214, data and information generated in steps 204-212 is processed to generate a new frame. Newly generated frames are assembled into an output video stream 216.

In this example, one can write a general equation:

${Obj}_{Idx} = ceil (\frac{{Out}_{Idx}}{S M_{f a c t o r}}), B G_{Idx} = {Out}_{Idx},$

where ceil(x) returns the smallest integer that is greater than or equal to x (i.e. rounds up the nearest integer) and SM_factoris the slow-motion factor of the object (in this example SM_factor=2).

In a second example one wants to make object 104 move twice as fast as in original video stream. Again, at least two frames need to be selected. One option is presented in Table 2.

Table 2

Out_Idx
1
2
3
4
5
6
7
8

Obj_Idx
1
2
3
4
5
6
7
8

BG_Idx
1
1
2
2
3
3
4
4

In this example, the general equation is:

${Obj}_{Idx} = {Out}_{Idx}, B G_{Idx} = ceil (\frac{{Out}_{Idx}}{S M_{f a c t o r}}) .$

Given a video of RGB images, i.e. frames F={f_i}_i=1^N^Framesand a depth map overlaying each frame D={d_i}_i=1^N^Frames, methods disclosed herein generate new videos in which the pixel groups P_FM1and P_FM2are played at different frame rate. The depth map can be obtained using for example stereo-camera triangulation, depth from motion, gated imaging, time of flight (TOF) cameras, coded aperture based cameras, a Laser Auto-focus unit (“Laser AF”), an image sensor with Phase Detection Auto Focus (“PDAF”) capability etc. In depth maps shown herein, the gray scale depicts the respective depth (white=zero distance from camera, black=infinite distance from camera). The depth maps or images discussed herein are assumed to be captured from a same POV or a similar POV as well as captured substantially simultaneously with the RGB images shown along with the depth maps.

For the sake of clarity the term “substantially” is used herein to imply the possibility of variations in values within an acceptable range. For example, “substantially simultaneously” may refer to the capture of frames for two video streams within ±5 ms, ±10 ms, ±20 ms or even ±30 ms. For example, “substantially simultaneously” may refer to the synchronization of frames from two video streams within ±5 ms, 10 ms, ±20 ms or even ±30 ms.

We distinguish two cases for the frame rate of segments of the image that are closer to the camera (i.e. SD⁻):

Case A (Example 1): the OOI or ROI and image segments closer to the camera than the OOI or ROI (foreground FG) are played at FR1, while image segments farther from the camera (background BG) than the OOI or ROI are played with FR2. SD⁻ is played at the same FR as SD (i.e. FR1) and all the other depths are played at FR2. Thus P_FR1=SD∪SD⁻ and P_FR2=SD⁺. In this case, we do not need to indicate where the pixels of SD⁻ are, since they are played at the same FR as SD such that OOIs or ROIs at SD will never be occluded. Therefore, we obtain FRM=BDM.

Case B (Example 2) only the OOI or ROI is played with FR1, while both FG and BG are played with FR2. SD⁻ is played at the same FR as SD⁺ (i.e. FR2). Thus P_FR1=SD, P_FR2=SD⁺∪SD⁻. Since SD and SD⁻ are played with different frame rate, some information will be missing in the newly generated frames because of occlusions.

In an additional, third example, different “depth slices” (parts of the image with of a certain corresponding depth range) for example, a first depth slice 1: 0.5-1 m, a second depth slice 2: 1-2 m, and a third depth slice 3: 2-4 m, are played with different FRs. For example, the RGB information of depth slice 1 is played with FR 1, the RGB information of depth slice 2 is played with FR 2, the RGB information of depth slice 3 is played with FR 3, etc. In some examples it may be FR1<FR2<FR3 etc., or vice versa. In other examples, there may not be such a FR order according to depth. This slicing principle may be used to, for example, highlight an OOI or ROI by leaving the OOI or ROI unmoved, and let the BG move faster the more far away it is from the OOI or RO. In some examples, artificial objects may be added to one or more of the depth slices. An artificial object may be an artificially created object such as an object drawn manually or by a computer. An artificial object may be image data not included in one of the images of the input video stream. In some examples, an artificial object may be image data from an image captured with another camera of a same host device.

In other examples, a physical property of entities (e.g. an OOI or ROI) other than depth may be used for defining object, FG and BG. A physical property may be spectral composition. In yet other examples, visual data such as texture of entities (e.g. an OOI or ROI) may be used for defining object, FG and BG.

FIG. 3 illustrates respective FRMs and BDMs of a specific image for cases A and B. Here, the OOI is a dancing girl 302. Corresponding to this image is a depth map of the same scene (not shown here). Here, a depth of the scene is assumed which increases constantly for larger Y values. A constantly increasing depth is e.g. shown in FIG. 4. In the given scene, a runner 304 is closer to the camera than girl 302. In case A, the FRM and BDM include both dancing girl 302 and runner 304 (as well as all other pixels with assigned depth smaller than that of girl 302). In case B, the FRM only includes girl 302, as well as pixel groups of the BG with assigned depth equal to the assigned depth of the girl 302. For case B, the BDM is differentiated into three pixel groups with different assigned depths: the depth of OOI 302 (SD, white), a depth larger than depth of OOI 302 (SD⁺, black), and a depth smaller than the depth of OOI 302 (SD⁻, gray).

FIG. 4 presents an example of an image set containing a RGB image (left side), next to a depth map (center) covering the same scene from the same (or very similar) point of view (POV) as that of the RGB image, and next to SD map (right side) derived according to a method disclosed herein. The specific SD is chosen based on the RGB image and depth map data. A runner 402 is closest to the camera, a girl 404 is farther away from the camera, and a boy 406 is at the farthest distance from the camera. Here, girl 404 is defined as the OOI, leading to the presented specific SD.

FIG. 5A presents a block diagram of processor numbered 500 in a system disclosed herein and used for case A. The following notations are used: foreground (FG) and background (BG) respective frames f_FG_Idxand f_BG_Idx, corresponding respective masks m_FG_Idxand m_BG_Idx, generated respective images f_BG_Idxand f_FG_Idxand a composed new output frame f_Out_Idx.

Processor 500 may be for example an application processor of a smartphone or a tablet. In processor 500, the input frames of the RGB video stream 502 and the depth map video stream 504 constitute data inputs for the method disclosed here. Depending on a FR speed chosen by a human user (e.g. manually) or chosen by a dedicated algorithm (e.g. automatically), indices of the frames to be used for the output video stream are selected by a FG and BG index selector module 506. These indices are the input for a mask generator module 508 that performs step 210 in FIG. 2. Depending on objects or areas of interest in the RGB image (also chosen by the human user or by the dedicated algorithm), the frames with the indices selected in 506 are requested from a frame and depth selector module 508a. Masks defining the areas that are played with different FRs are calculated in a mask extractor module 508b for the foreground FR, and in a mask extractor module 508c for the background BR. From module 508c, information is fed into a hole filler module 512, where missing information (e.g. because of occlusion of the object or area of interest by another object) is replaced by information calculated from input frames of RGB video stream 502 and depth map video stream 504 other than the ones actually used for the output video stream. A new frame generator module 514 assembles the information and outputs the newly generated video stream.

FIG. 5B presents a block diagram of processor numbered 500′ in a system disclosed herein and used for case B. In addition to modules and functions of processor 500 in FIG. 5A, processor 500′ includes an additional selected depth object estimator module 516, in which the depth of the selected object or area is estimated in case the selected object or area is occluded by another object.

Because of the more complex FRM deployed in case B compared to case A, this information must be generated, e.g. by estimation from other frames of the depth map video stream (e.g. neighboring frames), e.g. by deploying a motion model. Module 512 that computes f_BG_Idxremains practically the same as in case A, except for mask m_BG_Idxthat is passed to module 512. In contrast with case A, the mask now includes only the selected depth and not SD⁻.

FIG. 5C shows an embodiment of a camera disclosed herein and numbered 520. Camera 520 includes camera elements such as optical components (i.e. a lens system) 522 and an image sensor 524. Camera 520 may be a multi-camera system that has more than one lens system and image sensor. Images and video streams recorded via lens system 522 and image sensor 524 may be processed in an application processor 526 that interacts with a memory 528. A human user can trigger actions in the camera via a human machine interface “HMI” (or simply “interface”) 532. Information that supports actions such as generation of artificial image data and information may be stored in a database 534. In various embodiments, one or more of the components application processor 526, memory 528, HMI 532 and database 534 may be included in the camera. In some embodiments (such as in FIG. 5D) application processor 526, memory 528, HMI 532 and database 534 may be external to the camera.

FIG. 5D shows an embodiment of a host device disclosed herein and numbered 540, for example a smartphone or tablet. Device 540 comprises a camera 542, application processor 526, memory 528, HMI 532 and database 534. In some embodiments, database 534 may be virtual, with information not located physically on the device, but located on an external server, e.g. on a cloud server. Device 540 may comprise a multi camera system, e.g. several cameras for capturing RGB images and one or more additional sensing cameras, e.g. a time of flight (TOF) camera sensing depth information of a scene.

In some examples, camera 542 may provide the video stream input for the method described herein. In other examples, the video stream input may be supplied from outside a host device, e.g. via a cloud server.

FIGS. 6A-6C depict the generation of FRMs for the cases A and B outlined below. FIG. 6A shows two input video streams of the same scene as in FIG. 4, one input stream (left) being of RGB images (also referred to as “RGB image stream”), the other input stream (right) being of depth images (also referred to as “depth image stream”). As in FIG. 4, the images include runner 402, girl (OOI) 404 and boy 406. FIG. 6B shows the FRM generated for the input video streams of FIG. 6A for case A. FIG. 6C shows the FRM generated for the input video streams of FIG. 6A for case B. In input frame 4, we find that girl 404 is partly occluded by runner 402.

In FIG. 6B, the FRM includes the selected depth SD and all the depths closer to the camera SD⁻. In this case, the mask that needs to be extracted from the depth image is a binary mask that indicates where SD and SD⁻ are located in the RGB image. In the binary mask, “1” (white) represents the regions of SD and SD⁻ and “0” (black) represents all other depths. Foreground 408 and background 412 refers to segments of the image that have an assigned depth that is smaller and larger than the selected depth respectively.

The following describes a general method to provide effects like those in the first and second examples above in more detail. In step 206, FIG. 2, two frames are extracted from the input video streams. An output frame will be composed of these two frames, one frame being used for forming the background f_BG_Idxand the other frame being used for forming the foreground f_FG_Idx. BG_Idxand FG_Idxare indices that indicate which frame from the input frames, F, are selected, thus, BG_Idx, FG_Idx∈[1, 2 . . . N_Frames].

Once the indices from the input frames are chosen, the selected depth masks for the images need to be extracted.

The next step after the extraction of BG_Idxand FG_Idxis to select the BG and FG frames f_FG_Idxand f_BG_Idxtogether with their corresponding masks m_FG_Idxand m_BG_Idxand to generate the two image frames f_BG_Idxand f_FG_Idxthat will be combined (“stitched”) together to compose the new output frame f_Out_Idx. Since the regions of selected depths are never occluded, f_BG_Idxcan be obtained directly from the input frame and the corresponding mask. Therefore, f_FG_Idx=f_FG_Idx·m_FG_Idx.

To obtain f_BG_Idx, we need to delete the region in the image where m_BG_Idxindicates the selected depth, and fill this region with the background. To delete the region with the selected depth, we can for example use f′_BG_Idxm_BG_Idx·f_BG_Idx. To fill the missing information in the background, we can use methods such in-painting (see e.g. Bertalmio, Marcelo, Andrea L. Bertozzi, and Guillermo Sapiro. “Navier-Stokes, fluid dynamics and image and video inpainting.” Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on. Vol. 1. IEEE, 2001) or utilize information from consecutive frames (see e.g. Jia, Yun-Tao, Shi-Min Hu, and Ralph R. Martin. “Video completion using tracking and fragment merging.” The Visual Computer 21.8-10 (2005): 601-610). The indices of the input frames which will be used to fill the holes in f′_BG_Idxare: [BG_Idx−k, BG_Idx+k], where k is a parameter that indicates the number of consecutive frames from each side of f_BG_Idx. In general, k does not have to be constant, and can be different from frame to frame, in which case it will be marked as k_Out_Idx.

Once we have f_BG_Idxand f_FG_Idx, they can be stitched together using m_FG_Idxand methods described for example in Burt, Peter J., and Edward H. Adelson. “A multiresolution spline with application to image mosaics.” ACM Transactions on Graphics (TOG) 2.4 (1983): 217-236.

In case A, we used the depth map to detect the selected depth and all the depths closer to the camera, which were to be played at the same FR. This causes objects in the RGB image with corresponding selected depth to have all the information needed to generate the new frame in the output video in each frame.

In case B, we use the depth map in order to detect the regions of selected depth, which are to be played at the same FR. All regions with other corresponding depths are to be played at a different FR. Here, in general, the object in the selected depth will not contain all the information needed to compose the new frame (see e.g. input frame 4 in FIG. 6A), and there is a need to generate this information, e.g. by algorithms generating artificial input based on prior “experience”, or from other frames, e.g. from subsequent consecutive frames (e.g. by using a motion model). In this case, it is possible that an object that is closer to the camera than the selected depth will occlude parts of the objects in the selected depth, so that the FG frame and the corresponding mask will have holes where data is missing.

FIG. 6C shows the FRM generated for the input video streams of FIG. 6A for case B. In input frame 4, we find that girl 404 is partly occluded by runner 402.

The selection of the frame indexes from the input remains the same as in case A. The mask extracted from the depth image for the selected depth−m_FG_Idxdoes not contain all the information for the objects in the selected depth and therefore a new mask m_FG_Idxneeds to be defined. This mask is not extracted from the depth image, but estimated, e.g. by using information from other frames.

The information of the object in the selected depth that exists in f_FG_Idxwill be referred to as f′_FG_Idx=f_FG_Idx·m_FG_Idx. The frame with full object information within the selected depth derived e.g. based on information from consecutive frames) is given by f_FG_Idx.

FIG. 7 presents RGB images (first column), corresponding depth maps (second column) and the selected depth mask (third column for case A, fourth column for case B) from a frame related to the scene in FIGS. 6A-C that does not contain all information of the object in the selected depth. The situation of missing data in case B as described above is illustrated in the second row of FIG. 7, where the information on the mask is missing because of occlusion of an object caused by another object.

In case A, the mask is a binary mask. In case B, the mask is a mask with three values: 0 (black) 0.5 (gray) and 1 (white).

FIG. 8A presents the same RGB images as shown in FIG. 7 having a same corresponding depth map (not shown) that is shown in FIG. 7 (first column) and the information reconstruction process for the depth map part (second to fifth column) for case B, both for the case of complete depth map information (row 1), and for the case of incomplete depth map information because of occlusion (row 2). In the second row and fourth column, selected depth masks are presented that partly need to be generated, e.g. by estimations based on information of other frames: the gray parts in the mask are parts that are to be generated. In the fifth column, the selected depth mask with generated information is shown. This depth mask can further on be used for the new output frame composition.

FIG. 8B shows, along with the selected depth masks (second and fourth column), the respective masked RGB image segments (third column) and the RGB output frame of the computational step which fills missing data from neighboring frames (step 212 in FIG. 2) in the last (fifth) column.

While this disclosure describes a limited number of embodiments, it will be appreciated that many variations, modifications and other applications of such embodiments may be made. In general, the disclosure is to be understood as not limited by the specific embodiments described herein, but only by the scope of the appended claims.

It will also be understood that the presently disclosed subject matter further contemplates a suitably programmed computer for executing the operation as disclosed herein above. Likewise, the presently disclosed subject matter contemplates a computer program being readable by a computer for executing the method as disclosed herein. The presently disclosed subject matter further contemplates a non-transitory computer-readable memory tangibly embodying a program of instructions executable by the computer for executing the method as disclosed herein.

All references mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual reference was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present application.

Claims

1. A system, comprising: a digital camera configured to record an input video stream at an assigned frame rate;an interface operated to mark a first entity in a frame of the input video stream and a slow motion factor of the first entity, and to determine based on the slow motion factor a first frame rate FR1 for playing of the first entity in an output video stream and a second frame rate FR2 different from FR1, for playing of at least one second entity in the output video stream, wherein at least one of FR1 or FR2 is different from the assigned frame rate; anda processor configured to generate the output video stream based on the input video stream of the digital camera, the marked first entity, and the determined FR1 and FR2, wherein the output video stream includes the first entity played at FR1 and the at least one second entity played at FR2.
2. The system of claim 1, wherein the first entity is an object of interest (OOI) or region of interest (ROI) and wherein the at least one second entity is selected from the group consisting of another object, an image foreground, an image background and a combination thereof.
3. The system of claim 2, wherein the interface is operated by a human user.
4. The system of claim 3, wherein the OOI or the ROI is identified in at least one single frame of the input video stream with an object classification or an object segmentation algorithm.
5. The system of claim 4, wherein the OOI or ROI is tracked at least through a part of the input video stream with a tracking algorithm.
6. The system of claim 2, wherein the interface is operated by an application or by an algorithm.
7. The system of claim 1, wherein the output video stream includes at least one added entity played at a frame rate different from FR1 and FR2.
8. The system of claim 1, wherein the given input stream includes at least one given entity played at a frame rate different from FR1 and FR2.
9. The system of claim 1, wherein the processor is further configured to use a depth map stream that is spatially and temporally aligned with the input video stream to generate the output video stream.
10. The system of claim 9, wherein the depth map is used to determine a depth of each entity.
11. The system of claim 9, wherein the depth map is a discrete depth map of several classes, each class covering a range of specific depths.
12. The system of claim 11, wherein an entity is played with a frame rate that depends on the class covering a range of specific depths.
13. The system of claim 9, wherein the depth map is generated using image data of a Time-of-Flight camera.
14. The system of claim 9, wherein the depth map is generated using image data of a stereo camera.
15. The system of claim 9, wherein the depth map is generated using a laser autofocus unit.
16. The system of claim 9, wherein the depth map is generated using Phase Detection Auto Focus.
17. A method, comprising: by a processor configured to obtain an input video stream recorded at an assigned frame rate and to output an output video stream, marking a first entity in a frame of the input video stream;marking a slow motion factor of the first entity;determining based on the slow motion factor a first frame rate FR1 for playing of the first entity in the output video stream and a second frame rate FR2 different from FR1, for playing of at least one second entity in the output video stream, wherein at least one of FR1 or FR2 is different from the assigned frame rate; andgenerating the output video stream, wherein the output video stream includes the first entity played at FR1 and the second entity played at FR2.
18. The method of claim 17, further comprising using a depth map to determine a depth of each entity.
19. The method of claim 17, wherein the given input stream includes at least one given entity played at a frame rate that is different from FR1 and FR2.
20. The method of claim 17, further comprising using a depth map stream that is spatially and temporally aligned with the input video stream to generate the output video stream.
21. The method of claim 20, wherein the depth map is generated by using image data of a Time-of-Flight camera.
22. The method of claim 20, wherein the depth map is generated by using image data of a stereo camera.
23. The method of claim 20, wherein the depth map is generated by using a Laser Autofocus unit.
24. The method of claim 20, wherein the depth map is generated by using Phase Detection Auto Focus.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is related to and claims priority from US Provisional Patent Application No. 62/928,014 filed Oct. 30, 2019, which is incorporated herein by reference in its entirety.

US Referenced Citations (290)

Number	Name	Date	Kind
4199785	McCullough et al.	Apr 1980	A
5005083	Grage et al.	Apr 1991	A
5032917	Aschwanden	Jul 1991	A
5041852	Misawa et al.	Aug 1991	A
5051830	von Hoessle	Sep 1991	A
5099263	Matsumoto et al.	Mar 1992	A
5248971	Mandi	Sep 1993	A
5287093	Amano et al.	Feb 1994	A
5394520	Hall	Feb 1995	A
5436660	Sakamoto	Jul 1995	A
5444478	Lelong et al.	Aug 1995	A
5459520	Sasaki	Oct 1995	A
5657402	Bender et al.	Aug 1997	A
5682198	Katayama et al.	Oct 1997	A
5768443	Michael et al.	Jun 1998	A
5926190	Turkowski et al.	Jul 1999	A
5940641	McIntyre et al.	Aug 1999	A
5982951	Katayama et al.	Nov 1999	A
6101334	Fantone	Aug 2000	A
6128416	Oura	Oct 2000	A
6148120	Sussman	Nov 2000	A
6208765	Bergen	Mar 2001	B1
6268611	Pettersson et al.	Jul 2001	B1
6549215	Jouppi	Apr 2003	B2
6611289	Yu et al.	Aug 2003	B1
6643416	Daniels et al.	Nov 2003	B1
6650368	Doron	Nov 2003	B1
6680748	Monti	Jan 2004	B1
6714665	Hanna et al.	Mar 2004	B1
6724421	Glatt	Apr 2004	B1
6738073	Park et al.	May 2004	B2
6741250	Furlan et al.	May 2004	B1
6750903	Miyatake et al.	Jun 2004	B1
6778207	Lee et al.	Aug 2004	B1
7002583	Rabb, III	Feb 2006	B2
7015954	Foote et al.	Mar 2006	B1
7038716	Klein et al.	May 2006	B2
7199348	Olsen et al.	Apr 2007	B2
7206136	Labaziewicz et al.	Apr 2007	B2
7248294	Slatter	Jul 2007	B2
7256944	Labaziewicz et al.	Aug 2007	B2
7305180	Labaziewicz et al.	Dec 2007	B2
7339621	Fortier	Mar 2008	B2
7346217	Gold, Jr.	Mar 2008	B1
7365793	Cheatle et al.	Apr 2008	B2
7411610	Doyle	Aug 2008	B2
7424218	Baudisch et al.	Sep 2008	B2
7509041	Hosono	Mar 2009	B2
7533819	Barkan et al.	May 2009	B2
7619683	Davis	Nov 2009	B2
7738016	Toyofuku	Jun 2010	B2
7773121	Huntsberger et al.	Aug 2010	B1
7809256	Kuroda et al.	Oct 2010	B2
7880776	LeGall et al.	Feb 2011	B2
7918398	Li et al.	Apr 2011	B2
7964835	Olsen et al.	Jun 2011	B2
7978239	Deever et al.	Jul 2011	B2
8115825	Culbert et al.	Feb 2012	B2
8149327	Lin et al.	Apr 2012	B2
8154610	Jo et al.	Apr 2012	B2
8238695	Davey et al.	Aug 2012	B1
8274552	Dahi et al.	Sep 2012	B2
8390729	Long	Mar 2013	B2
8391697	Cho et al.	Mar 2013	B2
8400555	Georgiev et al.	Mar 2013	B1
8439265	Ferren et al.	May 2013	B2
8446484	Muukki et al.	May 2013	B2
8483452	Ueda et al.	Jul 2013	B2
8514491	Duparre	Aug 2013	B2
8547389	Hoppe et al.	Oct 2013	B2
8553106	Scarff	Oct 2013	B2
8587691	Takane	Nov 2013	B2
8619148	Watts et al.	Dec 2013	B1
8803990	Smith	Aug 2014	B2
8896655	Mauchly et al.	Nov 2014	B2
8976255	Matsuoto et al.	Mar 2015	B2
9019387	Nakano	Apr 2015	B2
9025073	Attar et al.	May 2015	B2
9025077	Attar et al.	May 2015	B2
9041835	Honda	May 2015	B2
9137447	Shibuno	Sep 2015	B2
9185291	Shabtay et al.	Nov 2015	B1
9215377	Sokeila et al.	Dec 2015	B2
9215385	Luo	Dec 2015	B2
9270875	Brisedoux et al.	Feb 2016	B2
9286680	Jiang et al.	Mar 2016	B1
9344626	Silverstein et al.	May 2016	B2
9360671	Zhou	Jun 2016	B1
9369621	Malone et al.	Jun 2016	B2
9413930	Geerds	Aug 2016	B2
9413984	Attar et al.	Aug 2016	B2
9420180	Jin	Aug 2016	B2
9438792	Nakada et al.	Sep 2016	B2
9485432	Medasani et al.	Nov 2016	B1
9578257	Attar et al.	Feb 2017	B2
9618748	Munger et al.	Apr 2017	B2
9681057	Attar et al.	Jun 2017	B2
9723220	Sugie	Aug 2017	B2
9736365	Laroia	Aug 2017	B2
9736391	Du et al.	Aug 2017	B2
9768310	Ahn et al.	Sep 2017	B2
9800798	Ravirala et al.	Oct 2017	B2
9851803	Fisher et al.	Dec 2017	B2
9894287	Qian et al.	Feb 2018	B2
9900522	Lu	Feb 2018	B2
9927600	Goldenberg et al.	Mar 2018	B2
20020005902	Yuen	Jan 2002	A1
20020030163	Zhang	Mar 2002	A1
20020063711	Park et al.	May 2002	A1
20020075258	Park et al.	Jun 2002	A1
20020122113	Foote	Sep 2002	A1
20020167741	Koiwai et al.	Nov 2002	A1
20030030729	Prentice et al.	Feb 2003	A1
20030093805	Gin	May 2003	A1
20030160886	Misawa et al.	Aug 2003	A1
20030202113	Yoshikawa	Oct 2003	A1
20040008773	Itokawa	Jan 2004	A1
20040012683	Yamasaki et al.	Jan 2004	A1
20040017386	Liu et al.	Jan 2004	A1
20040027367	Pilu	Feb 2004	A1
20040061788	Bateman	Apr 2004	A1
20040141065	Hara et al.	Jul 2004	A1
20040141086	Mihara	Jul 2004	A1
20040240052	Minefuji et al.	Dec 2004	A1
20050013509	Samadani	Jan 2005	A1
20050046740	Davis	Mar 2005	A1
20050157184	Nakanishi et al.	Jul 2005	A1
20050168834	Matsumoto et al.	Aug 2005	A1
20050185049	Iwai et al.	Aug 2005	A1
20050200718	Lee	Sep 2005	A1
20060054782	Olsen et al.	Mar 2006	A1
20060056056	Ahiska et al.	Mar 2006	A1
20060067672	Washisu et al.	Mar 2006	A1
20060102907	Lee et al.	May 2006	A1
20060125937	LeGall et al.	Jun 2006	A1
20060170793	Pasquarette et al.	Aug 2006	A1
20060175549	Miller et al.	Aug 2006	A1
20060187310	Janson et al.	Aug 2006	A1
20060187322	Janson et al.	Aug 2006	A1
20060187338	May et al.	Aug 2006	A1
20060227236	Pak	Oct 2006	A1
20070024737	Nakamura et al.	Feb 2007	A1
20070126911	Nanjo	Jun 2007	A1
20070177025	Kopet et al.	Aug 2007	A1
20070188653	Pollock et al.	Aug 2007	A1
20070189386	Imagawa	Aug 2007	A1
20070257184	Olsen et al.	Nov 2007	A1
20070285550	Son	Dec 2007	A1
20080017557	Witdouck	Jan 2008	A1
20080024614	Li et al.	Jan 2008	A1
20080025634	Border et al.	Jan 2008	A1
20080030592	Border et al.	Feb 2008	A1
20080030611	Jenkins	Feb 2008	A1
20080084484	Ochi et al.	Apr 2008	A1
20080106629	Kurtz et al.	May 2008	A1
20080117316	Orimoto	May 2008	A1
20080129831	Cho et al.	Jun 2008	A1
20080218611	Parulski et al.	Sep 2008	A1
20080218612	Border et al.	Sep 2008	A1
20080218613	Janson et al.	Sep 2008	A1
20080219654	Border et al.	Sep 2008	A1
20090086074	Li et al.	Apr 2009	A1
20090109556	Shimizu et al.	Apr 2009	A1
20090122195	Van Baar et al.	May 2009	A1
20090122406	Rouvinen et al.	May 2009	A1
20090128644	Camp et al.	May 2009	A1
20090180761	Wand	Jul 2009	A1
20090219547	Kauhanen et al.	Sep 2009	A1
20090252484	Hasuda et al.	Oct 2009	A1
20090295949	Ojala	Dec 2009	A1
20090324135	Kondo et al.	Dec 2009	A1
20100013906	Border et al.	Jan 2010	A1
20100020221	Tupman et al.	Jan 2010	A1
20100060746	Olsen et al.	Mar 2010	A9
20100097444	Lablans	Apr 2010	A1
20100103194	Chen et al.	Apr 2010	A1
20100165131	Makimoto et al.	Jul 2010	A1
20100196001	Ryynänen et al.	Aug 2010	A1
20100238327	Griffith et al.	Sep 2010	A1
20100259836	Kang et al.	Oct 2010	A1
20100283842	Guissin et al.	Nov 2010	A1
20100321494	Peterson et al.	Dec 2010	A1
20110058320	Kim et al.	Mar 2011	A1
20110063417	Peters et al.	Mar 2011	A1
20110063446	McMordie et al.	Mar 2011	A1
20110064327	Dagher et al.	Mar 2011	A1
20110080487	Venkataraman et al.	Apr 2011	A1
20110128288	Petrou et al.	Jun 2011	A1
20110164172	Shintani et al.	Jul 2011	A1
20110229054	Weston et al.	Sep 2011	A1
20110234798	Chou	Sep 2011	A1
20110234853	Hayashi et al.	Sep 2011	A1
20110234881	Wakabayashi et al.	Sep 2011	A1
20110242286	Pace et al.	Oct 2011	A1
20110242355	Goma et al.	Oct 2011	A1
20110298966	Kirschstein et al.	Dec 2011	A1
20120026366	Golan et al.	Feb 2012	A1
20120044372	Cote et al.	Feb 2012	A1
20120062780	Morihisa	Mar 2012	A1
20120069235	Imai	Mar 2012	A1
20120075489	Nishihara	Mar 2012	A1
20120105579	Jeon et al.	May 2012	A1
20120124525	Kang	May 2012	A1
20120154547	Aizawa	Jun 2012	A1
20120154614	Moriya et al.	Jun 2012	A1
20120196648	Havens et al.	Aug 2012	A1
20120229663	Nelson et al.	Sep 2012	A1
20120249815	Bohn et al.	Oct 2012	A1
20120287315	Huang et al.	Nov 2012	A1
20120320467	Baik et al.	Dec 2012	A1
20130002928	Imai	Jan 2013	A1
20130016427	Sugawara	Jan 2013	A1
20130063629	Webster et al.	Mar 2013	A1
20130076922	Shihoh et al.	Mar 2013	A1
20130093842	Yahata	Apr 2013	A1
20130094126	Rappoport et al.	Apr 2013	A1
20130113894	Mirlay	May 2013	A1
20130135445	Dahi et al.	May 2013	A1
20130155176	Paripally et al.	Jun 2013	A1
20130182150	Asakura	Jul 2013	A1
20130201360	Song	Aug 2013	A1
20130202273	Ouedraogo et al.	Aug 2013	A1
20130235224	Park et al.	Sep 2013	A1
20130250150	Malone et al.	Sep 2013	A1
20130258044	Betts-Lacroix	Oct 2013	A1
20130270419	Singh et al.	Oct 2013	A1
20130278785	Nomura et al.	Oct 2013	A1
20130321668	Kamath	Dec 2013	A1
20140009631	Topliss	Jan 2014	A1
20140049615	Uwagawa	Feb 2014	A1
20140118584	Lee et al.	May 2014	A1
20140192238	Attar et al.	Jul 2014	A1
20140192253	Laroia	Jul 2014	A1
20140218587	Shah	Aug 2014	A1
20140313316	Olsson et al.	Oct 2014	A1
20140362242	Takizawa	Dec 2014	A1
20150002683	Hu et al.	Jan 2015	A1
20150042870	Chan et al.	Feb 2015	A1
20150070781	Cheng et al.	Mar 2015	A1
20150092066	Geiss et al.	Apr 2015	A1
20150103147	Ho et al.	Apr 2015	A1
20150138381	Ahn	May 2015	A1
20150154776	Zhang et al.	Jun 2015	A1
20150162048	Hirata et al.	Jun 2015	A1
20150195458	Nakayama et al.	Jul 2015	A1
20150215516	Dolgin	Jul 2015	A1
20150237280	Choi et al.	Aug 2015	A1
20150242994	Shen	Aug 2015	A1
20150244906	Wu et al.	Aug 2015	A1
20150253543	Mercado	Sep 2015	A1
20150253647	Mercado	Sep 2015	A1
20150261299	Wajs	Sep 2015	A1
20150271471	Hsieh et al.	Sep 2015	A1
20150281678	Park et al.	Oct 2015	A1
20150286033	Osborne	Oct 2015	A1
20150316744	Chen	Nov 2015	A1
20150334309	Peng et al.	Nov 2015	A1
20160044250	Shabtay et al.	Feb 2016	A1
20160070088	Koguchi	Mar 2016	A1
20160154202	Wippermann et al.	Jun 2016	A1
20160154204	Lim et al.	Jun 2016	A1
20160212358	Shikata	Jul 2016	A1
20160212418	Demirdjian et al.	Jul 2016	A1
20160241751	Park	Aug 2016	A1
20160291295	Shabtay et al.	Oct 2016	A1
20160295112	Georgiev et al.	Oct 2016	A1
20160301840	Du et al.	Oct 2016	A1
20160353008	Osborne	Dec 2016	A1
20160353012	Kao et al.	Dec 2016	A1
20170019616	Zhu et al.	Jan 2017	A1
20170070731	Darling et al.	Mar 2017	A1
20170187962	Lee et al.	Jun 2017	A1
20170214846	Du et al.	Jul 2017	A1
20170214866	Zhu et al.	Jul 2017	A1
20170242225	Fiske	Aug 2017	A1
20170289458	Song et al.	Oct 2017	A1
20180013944	Evans, V et al.	Jan 2018	A1
20180017844	Yu et al.	Jan 2018	A1
20180024329	Goldenberg et al.	Jan 2018	A1
20180059379	Chou	Mar 2018	A1
20180120674	Avivi et al.	May 2018	A1
20180150973	Tang et al.	May 2018	A1
20180176426	Wei et al.	Jun 2018	A1
20180198897	Fang et al.	Jul 2018	A1
20180241922	Baldwin et al.	Aug 2018	A1
20180295292	Lee et al.	Oct 2018	A1
20180300901	Wakai et al.	Oct 2018	A1
20190121103	Bachar et al.	Apr 2019	A1
20190215438	Lee	Jul 2019	A1
20190265875	Park	Aug 2019	A1

Foreign Referenced Citations (39)

Number	Date	Country
101276415	Oct 2008	CN
201514511	Jun 2010	CN
102739949	Oct 2012	CN
103024272	Apr 2013	CN
103841404	Jun 2014	CN
1536633	Jun 2005	EP
1780567	May 2007	EP
2523450	Nov 2012	EP
S59191146	Oct 1984	JP
04211230	Aug 1992	JP
H07318864	Dec 1995	JP
08271976	Oct 1996	JP
2002010276	Jan 2002	JP
2003298920	Oct 2003	JP
2004133054	Apr 2004	JP
2004245982	Sep 2004	JP
2005099265	Apr 2005	JP
2006238325	Sep 2006	JP
2007228006	Sep 2007	JP
2007306282	Nov 2007	JP
2008076485	Apr 2008	JP
2010204341	Sep 2010	JP
2011085666	Apr 2011	JP
2013106289	May 2013	JP
20070005946	Jan 2007	KR
20090058229	Jun 2009	KR
20100008936	Jan 2010	KR
20140014787	Feb 2014	KR
101477178	Dec 2014	KR
20140144126	Dec 2014	KR
20150118012	Oct 2015	KR
2000027131	May 2000	WO
2004084542	Sep 2004	WO
2006008805	Jan 2006	WO
2010122841	Oct 2010	WO
2014072818	May 2014	WO
2017025822	Feb 2017	WO
2017037688	Mar 2017	WO
2018130898	Jul 2018	WO

Non-Patent Literature Citations (17)

Entry
Statistical Modeling and Performance Characterization of a Real-Time Dual Camera Surveillance System, Greienhagen et al., Publisher: IEEE, 2000, 8 pages.
A 3MPixel Multi-Aperture Image Sensor with 0.7μm Pixels in 0.11μm CMOS, Fife et al., Stanford University, 2008, 3 pages.
Dual camera intelligent sensor for high definition 360 degrees surveillance, Scotti et al., Publisher: IET, May 9, 2000, 8 pages.
Dual-sensor foveated imaging system, Hua et al., Publisher: Optical Society of America, Jan. 14, 2008, 11 pages.
Defocus Video Matting, McGuire et al., Publisher: ACM SIGGRAPH, Jul. 31, 2005, 11 pages.
Compact multi-aperture imaging with high angular resolution, Santacana et al., Publisher: Optical Society of America, 2015, 10 pages.
Multi-Aperture Photography, Green et al., Publisher: Mitsubishi Electric Research Laboratories, Inc., Jul. 2007, 10 pages.
Multispectral Bilateral Video Fusion, Bennett et al., Publisher: IEEE, May 2007, 10 pages.
Super-resolution imaging using a camera array, Santacana et al., Publisher: Optical Society of America, 2014, 6 pages.
Optical Splitting Trees for High-Precision Monocular Imaging, McGuire et al., Publisher: IEEE, 2007, 11 pages.
High Performance Imaging Using Large Camera Arrays, Wilburn et al., Publisher: Association for Computing Machinery, Inc., 2005, 12 pages.
Real-time Edge-Aware Image Processing with the Bilateral Grid, Chen et al., Publisher: ACM SIGGRAPH, 2007, 9 pages.
Superimposed multi-resolution imaging, Carles et al., Publisher: Optical Society of America, 2017, 13 pages.
Viewfinder Alignment, Adams et al., Publisher: EUROGRAPHICS, 2008, 10 pages.
Dual-Camera System for Multi-Level Activity Recognition, Bodor et al., Publisher: IEEE, Oct. 2014, 6 pages.
Engineered to the task: Why camera-phone cameras are different, Giles Humpston, Publisher: Solid State Technology, Jun. 2009, 3 pages.
Office Action in related EP patent application No. 19845570.1, dated Jun. 9, 2020. 10 pages.

Related Publications (1)

	Number	Date	Country
	20210133475 A1	May 2021	US

Provisional Applications (1)

	Number	Date	Country
	62928014	Oct 2019	US

Slow or fast motion video using depth information

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

US

CPC

International Classifications

Term Extension

Abstract