This disclosure relates to editing videos based on movements captured within the videos.
Shakiness of image sensor(s) that capture video content may result in shaky video content. A user may wish to edit the video content to remove shaky portions. Determining shakiness of portions of video content may be resource intensive (e.g., time, memory, processing power).
This disclosure relates to editing videos based on movements captured within the videos. Video information defining video content may be accessed. The video content may include video frames. Motion vectors for the video frames may be determined. The motion vectors may represent motion of one or more visuals captured within individual video frames. A transformation matrix for the video frames may be determined based on the motion vectors. The transformation matrix may characterize rigid transformations between pairs of the video frames. Shakiness metrics for the video frames may be determined based on the transformation matrix. A shakiness threshold may be obtained. One or more of the video frames may be identified based on the shakiness metrics and the shakiness threshold. A video summary of the video content may be generated. The video summary may include the one or more identified video frames.
A system that edit videos based on movements captured within the videos may include one or more of physical storage media, processors, and/or other components. The physical storage media may store video information defining video content. Video content may refer to media content that may be consumed as one or more videos. Video content may include one or more videos stored in one or more formats/container, and/or other video content. The video content may have a progress length. In some implementations, the video content may include one or more of spherical video content, virtual reality content, and/or other video content. The video content may include video frames. The video frames may be characterized by a width, a height, and/or other characteristics.
The processor(s) may be configured by machine-readable instructions. Executing the machine-readable instructions may cause the processor(s) to facilitate editing videos based on movements captured within the videos. The machine-readable instructions may include one or more computer program components. The computer program components may include one or more of an access component, a motion vector component, a transformation matrix component, a shakiness metric component, a shakiness threshold component, an identification component, a video summary component, and/or other computer program components.
The access component may be configured to access the video information defining one or more video content and/or other information. The access component may access video information from one or more storage locations. The access component may be configured to access video information defining one or more video content during acquisition of the video information and/or after acquisition of the video information by one or more image sensors.
The motion vector component may be configured to determine motion vectors for the video frames. The motion vectors may represent motion of one or more visuals captured within individual video frames, and/or other information. In some implementations, the motion vectors for the video frames may be determined based on movement of a pixel group and/or other information. In some implementations, the pixel group may include an eight-by-eight pixel block.
The transformation matrix component may be configured to determine a transformation matrix for the video frames based on the motion vectors and/or other information. The transformation matrix may characterize rigid transformations between pairs of the video frames and/or other information. In some implementations, the rigid transformations may be estimated using a RANSAC (RANdom SAmple Consensus) algorithm and/or other information.
The shakiness metric component may be configured to determine shakiness metrics for the video frames. The shakiness metric for the video frames may be determined based on the transformation matrix and/or other information. In some implementations, the shakiness metrics for the video frames may be determined based on a rate of change in a lateral translation, a rate of change in a vertical translation, a rate of change in an angular translation, and/or a rate of change in a linear first-order approximation of a scaling factor of the transformation matrix. In some implementations, the rate of change in the lateral translation may be normalized based on a width of the video content, the rate of change in the vertical translation may be normalized based on a height of the video content, the rate of change in the angular translation may be matched to the lateral translation and the vertical translation via a first matching factor, and/or the rate of change in the linear first-order approximation of the scaling factor may be matched to the lateral translation and the vertical translation via a second matching factor.
The shakiness threshold component may be configured to obtain a shakiness threshold and/or other information. In some implementations, the shakiness threshold may be obtained based on user input.
The identification component may be configured to identify one or more of the video frames based on the shakiness metrics, the shakiness threshold, and/or other information. In some implementations, one or more of the video frames may be identified based on comparison of the shakiness metrics of the video frames and the shakiness threshold and/or other information. In some implementations, identification may also be content-dependent based on hysteresis algorithms with multi-knee parameters or finite-state machines.
The video summary component may be configured to generate a video summary of the video content. The video summary may include the one or more identified video frames and/or other information. In some implementations, the video summary component may be configured to apply one or more filters to remove noise from the video frames. In some implementations, the video content may have been captured at thirty frames per second, and one or more filters may be applied using a time scale of fives frames. In some implementations, the video content may have been captured at sixty frames per second, and one or more filters may be applied using a time scale of ten frames.
These and other objects, features, and characteristics of the system and/or method disclosed herein, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the invention. As used in the specification and in the claims, the singular form of “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise.
Storage media 12 may be configured to include electronic storage medium that electronically stores information. Storage media 12 may store software algorithms, information determined by processor 11, information received remotely, and/or other information that enables system 10 to function properly. For example, storage media 12 may store information relating to video information, video content, video frames, motion vectors, transformation matrix, shakiness metric, shakiness threshold, video summary, and/or other information.
Storage media 12 may store video information 20 defining one or more video content. Video content may refer to media content that may be consumed as one or more videos. Video content may include one or more videos stored in one or more formats/container, and/or other video content. A video may include a video clip captured by a video capture device, multiple video clips captured by a video capture device, and/or multiple video clips captured by separate video capture devices. A video may include multiple video clips captured at the same time and/or multiple video clips captured at different times. A video may include a video clip processed by a video application, multiple video clips processed by a video application and/or multiple video clips processed by separate video applications.
Video content may include video frames. The video frames may be characterized by a width, a height, and/or other characteristics. For example,
Video content may have a progress length. A progress length may be defined in terms of time durations and/or frame numbers. For example, video content may include a video having a time duration of 60 seconds. Video content may include a video having 1800 video frames. Video content having 1800 video frames may have a play time duration of 60 seconds when viewed at 30 frames/second. Other time durations and frame numbers are contemplated.
In some implementations, video content may include one or more of spherical video content, virtual reality content, and/or other video content. Spherical video content may refer to a video capture of multiple views from a single location. Spherical video content may include a full spherical video capture (360 degrees of capture) or a partial spherical video capture (less than 360 degrees of capture). Spherical video content may be captured through the use of one or more cameras/image sensors to capture images/videos from a location. The captured images/videos may be stitched together to form the spherical video content.
Virtual reality content may refer to content that may be consumed via virtual reality experience. Virtual reality content may associate different directions within the virtual reality content with different viewing directions, and a user may view a particular directions within the virtual reality content by looking in a particular direction. For example, a user may use a virtual reality headset to change the user's direction of view. The user's direction of view may correspond to a particular direction of view within the virtual reality content. For example, a forward looking direction of view for a user may correspond to a forward direction of view within the virtual reality content.
Spherical video content and/or virtual reality content may have been captured at one or more locations. For example, spherical video content and/or virtual reality content may have been captured from a stationary position (e.g., a seat in a stadium). Spherical video content and/or virtual reality content may have been captured from a moving position (e.g., a moving bike). Spherical video content and/or virtual reality content may include video capture from a path taken by the capturing device(s) in the moving position. For example, spherical video content and/or virtual reality content may include video capture from a person walking around in a music festival.
Referring to
Access component 102 may be configured to access video information defining one or more video content and/or other information. Access component 102 may access video information from one or more storage locations. A storage location may include storage media 12, electronic storage of one or more image sensors (not shown in
Motion vector component 104 may be configured to determine motion vectors for the video frames. Motion vectors may represent motion of one or more visuals captured within individual video frames and/or other information. Motion may exist within video frames due to motion of image sensor(s) that captured the video frames and/or due to motion of a thing captured within the video frames. Motion vectors may be determined using one or more of block-matching algorithm, phase correlation and frequency domain methods, pixel recursive algorithms, optical flow, feature detection, and/or other criteria matching methods.
The motion vectors for the video frames may be determined based on movement of a pixel group and/or other information. Determining motion vectors based on movement of a pixel group rather than on movement of individual pixels may increase the speed with which motion vectors may be determined. Determining motion vectors based on movement of a pixel group rather than on movement of individual pixels may require less resources (e.g., memory, speed). In some implementations, the pixel group may include an eight-by-eight pixel block. For example,
Transformation matrix component 106 may be configured to determine a transformation matrix for the video frames based on the motion vectors and/or other information. Motion vectors may characterize a rotation angle and two translations between pairs of video frames. A transformation matrix may characterize rigid transformations between pairs of video frames and/or other information. For example, a transformation matrix may characterize rigid transformation between video frame A 302 and video frame B 304. In some implementations, rigid transformations may be estimated using a RANSAC (RANdom SAmple Consensus) algorithm and/or other information. Parameters for a transformation matrix may include x(t), y(t), θ(t), and/or other parameters. The term θ(t) may characterize rotations as a function of time (angular translation), and the terms x(t) and y(t) may characterize translations in the x-direction (lateral translation) and y-direction (vertical translation), respectively, as a function of time. Other directions are contemplated.
Shakiness metric component 108 may be configured to determine shakiness metrics for the video frames. The shakiness metric for the video frames may be determined based on the transformation matrix and/or other information.
For example,
One or more terms of shakiness metric determination may be normalized. For example, the lateral translation may be normalized based on a width of the video content (1/W). The vertical translation may be normalized based on a height of the video content (1/H) or an aspect ratio A (X/Y), The angular translation may be matched to the lateral translation and the vertical translation via a first matching factor (λ). The first matching factor (λ) may represent an angle spatial-normalization coefficient. The rate of change in the linear first-order approximation of the scaling factor may be matched to the lateral translation and the vertical translation via a second matching factor (μ). The second matching factor (μ) may represent a scaling normalization coefficient.
Shakiness metrics may be normalized (e.g., between 0 and 1), where 0 indicates a stable movement of image sensor(s) that captured the video content and 1 indicates a shaky movement of the image sensor(s). Shakiness metrics of the video frames of video content as a function of progress through the video content may form a shakiness curve for the video content. For example,
In some implementations, shakiness metrics may be classified into multiple levels of shakiness (e.g., not shaky, acceptable, bad). Shakiness metrics may be classified using a dynamic thresholding with hysteresis method and/or other information. A medium sliding noise value may be acquired by applying a low-pass filter on the shakiness signal, such as shown in
The determination of motion vectors and shakiness metrics as described herein may enable fast and efficient determination of shakiness metrics for individual video frames of the video content. The determination of motion vectors and shakiness metrics as described herein may enable resource limited devices (e.g., mobile devices) to determine the shakiness metrics of the video frames without offloading the determination task to another computing device with more resources (e.g., server).
Shakiness threshold component 110 may be configured to obtain a shakiness threshold and/or other information. A shakiness threshold may refer to a level(s) of shakiness metric at which the video edits should be made. A shakiness threshold may indicate the level(s) of shakiness metric at which the video frames should be trimmed. Shakiness threshold component may obtain a shakiness threshold based on one or more of a system default, user input, desired length of the video edit, accompanying soundtrack, video content (e.g., type of event/scene within the video content), and/or other information. In some implementations, the shakiness threshold may form a shakiness threshold line. A shakiness threshold line may include one or more of straight (horizontal, vertical, sloped) portions and/or curved portions.
Identification component 112 may be configured to identify one or more of the video frames based on the shakiness metrics, the shakiness threshold, and/or other information. One or more of the video frames may be identified based on comparison of the shakiness metrics of the video frames and the shakiness threshold and/or other information. Identified video frames may include those video frames with shakiness metric lower than, higher than, and/or equal to the shakiness threshold. For example, referring to
Video summary component 114 may be configured to generate a video summary of the video content. A video summary may include a combination/concatenation of vide frames identified by identification component 112. The video summary may include one or more of the identified video frames and/or other information. For example, referring to
In some implementations, video summary component 114 may be configured to apply one or more filters to remove noise from one or more video frames. In some implementations, the filters may be applied with a time scale ratio of 1 to 6. For example, the video content may have been captured at thirty frames per second, and one or more filters may be applied using a time scale of fives frames. The video content may have been captured at sixty frames per second, and one or more filters may be applied using a time scale of ten frames. Other time scale ratios are contemplated.
Implementations of the disclosure may be made in hardware, firmware, software, or any suitable combination thereof. Aspects of the disclosure may be implemented as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a tangible computer readable storage medium may include read only memory, random access memory, magnetic disk storage media, optical storage media, flash memory devices, and others, and a machine-readable transmission media may include forms of propagated signals, such as carrier waves, infrared signals, digital signals, and others. Firmware, software, routines, or instructions may be described herein in terms of specific exemplary aspects and implementations of the disclosure, and performing certain actions.
Although processor 11 and storage media 12 are shown to be connected to interface 13 in
Although processor 11 is shown in
It should be appreciated that although computer components are illustrated in
The description of the functionality provided by the different computer program components described herein is for illustrative purposes, and is not intended to be limiting, as any of computer program components may provide more or less functionality than is described. For example, one or more of computer program components 102, 104, 106, 108, 110, 112, and/or 114 may be eliminated, and some or all of its functionality may be provided by other computer program components. As another example, processor 11 may be configured to execute one or more additional computer program components that may perform some or all of the functionality attributed to one or more of computer program components 102, 104, 106, 108, 110, 112, and/or 114 described herein.
The electronic storage media of storage media 12 may be provided integrally (i.e., substantially non-removable) with one or more components of system 10 and/or removable storage that is connectable to one or more components of system 10 via, for example, a port (e.g., a USB port, a Firewire port, etc.) or a drive (e.g., a disk drive, etc.). Storage media 12 may include one or more of optically readable storage media (e.g., optical disks, etc.), magnetically readable storage media (e.g., magnetic tape, magnetic hard drive, floppy drive, etc.), electrical charge-based storage media (e.g., EPROM, EEPROM, RAM, etc.), solid-state storage media (e.g., flash drive, etc.), and/or other electronically readable storage media. Storage media 12 may be a separate component within system 10, or storage media 12 may be provided integrally with one or more other components of system 10 (e.g., processor 11). Although storage media 12 is shown in
In some implementations, method 200 may be implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, a central processing unit, a graphics processing unit, a microcontroller, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information). The one or more processing devices may include one or more devices executing some or all of the operation of method 200 in response to instructions stored electronically on one or more electronic storage mediums. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operation of method 200.
Referring to
At operation 202, motion vectors for the video frames may be determined. The motion vectors may represent motion of one or more visuals captured within individual video frames. In some implementations, operation 202 may be performed by a processor component the same as or similar to motion vector component 104 (Shown in
At operation 203, a transformation matrix may be determined for the video frames. The transformation matrix may be determined based on the motion vectors. The transformation matrix may characterize rigid transformations between pairs of video frames. In some implementations, operation 203 may be performed by a processor component the same as or similar to transformation matrix component 106 (Shown in
At operation 204, shakiness metrics for the video frames may be determined. The shakiness metrics may be determined based on the transformation metric. In some implementations, operation 204 may be performed by a processor component the same as or similar to shakiness metric component 108 (Shown in
At operation 205, device shakiness threshold may be obtained. In some implementations, operation 205 may be performed by a processor component the same as or similar to shakiness threshold component 110 (Shown in
At operation 206, one or more of the video frames may be identified based on the shakiness metrics and the shakiness threshold. In some implementations, operation 206 may be performed by a processor component the same as or similar to identification component 110 (Shown in
At operation 207, a video summary of the video content may be generated. The video summary may include one or more identified video frames. In some implementations, operation 207 may be performed by a processor component the same as or similar to video summary component 110 (Shown in
Although the system(s) and/or method(s) of this disclosure have been described in detail for the purpose of illustration based on what is currently considered to be the most practical and preferred implementations, it is to be understood that such detail is solely for that purpose and that the disclosure is not limited to the disclosed implementations, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present disclosure contemplates that, to the extent possible, one or more features of any implementation can be combined with one or more features of any other implementation.
Number | Name | Date | Kind |
---|---|---|---|
6633685 | Kusama | Oct 2003 | B1 |
7512886 | Herberger | Mar 2009 | B1 |
7885426 | Golovchinsky | Feb 2011 | B2 |
7970240 | Chao | Jun 2011 | B1 |
8180161 | Haseyama | May 2012 | B2 |
8606073 | Woodman | Dec 2013 | B2 |
8774560 | Sugaya | Jul 2014 | B2 |
8971623 | Gatt | Mar 2015 | B2 |
8990328 | Grigsby | Mar 2015 | B1 |
9041727 | Ubillos | May 2015 | B2 |
9142257 | Woodman | Sep 2015 | B2 |
9342376 | Jain | May 2016 | B2 |
9418283 | Natarajan | Aug 2016 | B1 |
20020165721 | Chang | Nov 2002 | A1 |
20040001706 | Jung | Jan 2004 | A1 |
20050025454 | Nakamura | Feb 2005 | A1 |
20050108031 | Grosvenor | May 2005 | A1 |
20050198018 | Shibata | Sep 2005 | A1 |
20060080286 | Svendsen | Apr 2006 | A1 |
20060115108 | Rodriguez | Jun 2006 | A1 |
20070183497 | Luo | Aug 2007 | A1 |
20080123976 | Coombs | May 2008 | A1 |
20080183843 | Gavin | Jul 2008 | A1 |
20090019995 | Miyajima | Jan 2009 | A1 |
20090027499 | Nicholl | Jan 2009 | A1 |
20090125559 | Yoshino | May 2009 | A1 |
20090252474 | Nashida | Oct 2009 | A1 |
20100046842 | Conwell | Feb 2010 | A1 |
20100086216 | Lee | Apr 2010 | A1 |
20100161720 | Colligan | Jun 2010 | A1 |
20100199182 | Lanza | Aug 2010 | A1 |
20100274714 | Sims | Oct 2010 | A1 |
20100318660 | Balsubramanian | Dec 2010 | A1 |
20110103700 | Haseyama | May 2011 | A1 |
20110137156 | Razzaque | Jun 2011 | A1 |
20110170086 | Oouchida | Jul 2011 | A1 |
20110206351 | Givoly | Aug 2011 | A1 |
20110242098 | Tamaru | Oct 2011 | A1 |
20120014673 | O'Dwyer | Jan 2012 | A1 |
20120027381 | Kataoka | Feb 2012 | A1 |
20120030263 | John | Feb 2012 | A1 |
20120141019 | Zhang | Jun 2012 | A1 |
20120210205 | Sherwood | Aug 2012 | A1 |
20120210228 | Wang | Aug 2012 | A1 |
20120246114 | Edmiston | Sep 2012 | A1 |
20120283574 | Park | Nov 2012 | A1 |
20130182166 | Shimokawa | Jul 2013 | A1 |
20130235071 | Ubillos | Sep 2013 | A1 |
20130239051 | Albouze | Sep 2013 | A1 |
20130330019 | Kim | Dec 2013 | A1 |
20140149865 | Tanaka | May 2014 | A1 |
20140152762 | Ukil | Jun 2014 | A1 |
20140282661 | Martin | Sep 2014 | A1 |
20150036010 | Wu | Feb 2015 | A1 |
20150039646 | Sharifi | Feb 2015 | A1 |
20150071547 | Keating | Mar 2015 | A1 |
20150113009 | Zhou | Apr 2015 | A1 |
20150146022 | Pham | May 2015 | A1 |
20150156247 | Hensel | Jun 2015 | A1 |
20150287435 | Land | Oct 2015 | A1 |
20160029105 | Newman | Jan 2016 | A1 |
20160055381 | Adsumilli | Feb 2016 | A1 |
20160094601 | Besehanic | Mar 2016 | A1 |
20160103830 | Cheong | Apr 2016 | A1 |
20160189752 | Galant | Jun 2016 | A1 |
20160260000 | Yamakawa | Sep 2016 | A1 |
20160286235 | Yamamoto | Sep 2016 | A1 |
Number | Date | Country |
---|---|---|
H09181966 | Jul 1997 | JP |
2005252459 | Sep 2005 | JP |
2006053694 | Feb 2006 | JP |
2006053694 | Feb 2006 | JP |
2008059121 | Mar 2008 | JP |
2009053748 | Mar 2009 | JP |
2011188004 | Sep 2011 | JP |
2011188004 | Sep 2011 | JP |
2006001361 | Jan 2006 | WO |
2009040538 | Apr 2009 | WO |
2012057623 | May 2012 | WO |
2012057623 | May 2012 | WO |
2012086120 | Jun 2012 | WO |
Entry |
---|
Nicole Lee, Twitter's Periscope is the best livestreaming video app yet; Mar. 26, 2015 URL:http://www.engadget.com/2015/03/26/periscope/ [Retrieved Aug. 25, 2015] 11 pages. |
Japanese Office Action for JP Application No. 2013-140131, dated Aug. 5, 2014, 6 pages. |
Office Action for U.S. Appl. No. 13/831,124, dated Mar. 19, 2015, 14 pages. |
PSonar URL: http://www.psonar.com/about retrieved on Aug. 24, 2016, 3 pages. |
PCT International Search Report and Written Opinion for PCT/US2015/023680, dated Oct. 6, 2015, 13 pages. |
PCT International Written Opinion for PCT/US2015/041624, dated Dec. 17, 2015, 7 Pages. |
PCT International Search Report for PCT/US15/41624 dated Nov. 4, 2015, 5 pages. |
PCT International Search Report for PCT/US15/23680 dated Aug. 3, 2015, 4 pages. |
PCT International Preliminary Report on Patentability for PCT/US2015/023680, dated Oct. 4, 2016, 10 pages. |
FFmpeg, “Demuxing,” Doxygen, Dec. 5, 2014, 15 Pages, [online] [retrieved on Jul. 13, 2015] Retrieved from the Internet <URL:https://www.ffmpeg.org/doxygen/2.3/group—lavf—encoding.html>. |
PCT International Search Report and Written Opinion for PCT/US15/12086 dated Mar. 17, 2016, 20 pages. |
FFmpeg, “Muxing,” Doxygen, Jul. 20, 2014, 9 Pages, [online] [retrieved on Jul. 13, 2015] Retrieved from the internet <URL: https://www.ffmpeg.org/doxyg en/2. 3/structA VP a ck et. html>. |
Iandola et al., “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size”, arXiv:1602.07360v3 [cs.CV] Apr. 6, 2016 (9 pgs.). |
Yang et al., “Unsupervised Extraction of Video Highlights Via Robust Recurrent Auto-encoders” arXiv:1510.01442v1 [cs.CV] Oct. 6, 2015 (9 pgs). |
Tran et al., “Learning Spatiotemporal Features with 3D Convolutional Networks”, arXiv:1412.0767 [cs.CV] Dec. 2, 2014 (9 pgs). |
Schroff et al., “FaceNet: A Unified Embedding for Face Recognition and Clustering,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, 10 pgs. |
Parkhi et al., “Deep Face Recognition,” Proceedings of the British Machine Vision, 2015, 12 pgs. |
Iandola et al., “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size,” arXiv:1602.07360, 2016, 9 pgs. |
Ioffe et al., “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv:1502.03167, 2015, 11 pgs. |
He et al., “Deep Residual Learning for Image Recognition,” arXiv:1512.03385, 2015, 12 pgs. |
Han et al., Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, International Conference on Learning Representations 2016, 14 pgs. |