At least one embodiment of this disclosure relates generally to techniques for filming, producing, editing, and/or presenting media content based on data created by one or more visual and/or non-visual sensors.
Video production is the process of creating video by capturing moving images, and then creating combinations and reductions of parts of the video in live production and post-production. Finished video productions range in size and can include, for example, television programs, television commercials, corporate videos, event videos, etc. The type of recording device used to capture video often changes based on the intended quality of the finished video production. For example, one individual may use a mobile phone to record a short video clip that will be uploaded to social media (e.g., Facebook or Instagram), while another individual may use a multiple-camera setup to shoot a professional-grade video clip.
Video editing software is often used to handle the post-production video editing of digital video sequences. Video editing software typically offers a range of tools for trimming, splicing, cutting, and arranging video recordings (also referred to as “video clips”) across a timeline. Examples of video editing software include Adobe Premiere Pro, Final Cut Pro X, iMovie, etc. However, video editing software may be difficult to use, particularly for those individuals who capture video using a personal computing device (e.g., a mobile phone) and only intend to upload the video to social media or retain it for personal use.
Various objects, features, and characteristics will become apparent to those skilled in the art from a study of the Detailed Description in conjunction with the appended claims and drawings, all of which form a part of this specification. While the accompanying drawings include illustrations of various embodiments, the drawings are not intended to limit the claimed subject matter.
The figures depict various embodiments described throughout the Detailed Description for the purposes of illustration only. While specific embodiments have been shown by way of example in the drawings and are described in detail below, one skilled in the art will readily recognize the subject matter is amenable to various modifications and alternative forms without departing from the principles of the invention described herein. Accordingly, the claimed subject matter is intended to cover all modifications, equivalents, and alternatives falling within the scope of the invention as defined by the appended claims.
Introduced herein are systems and techniques for improving media content production and consumption by utilizing metadata associated with the relevant media content. The metadata can include, for example, sensor data created by a visual sensor (e.g., a camera or light sensor) and/or a non-visual sensor (e.g., an accelerometer, gyroscope, magnetometer, barometer, global positioning system module, or inertial measurement unit) that is connected to a filming device, an operator device for controlling the filming device, or some other computing device associated with a user of the filming device.
Such techniques have several applications, including:
One skilled in the art will recognize that the techniques described herein can be implemented independent of the type of filming device used to capture raw video. For example, such techniques could be applied to an unmanned aerial vehicle (UAV) copter, an action camera (e.g., a GoPro camera (or Garmin VIRB), a mobile phone, tablet, or personal computer (e.g., desktop or laptop computer). More specifically, a user of an action camera may wear a tracker (also referred to more simply as a “computing device” or an “operator device”) that generates sensor data, which can be used to identify interesting segments of raw video captured by the action camera.
Video compositions (and other media content) can be created using different “composition recipes” that specify an appropriate style or mood and that allow video content to be timed to match audio content (e.g., music and sound effects). While the “composition recipes” allow videos to be automatically created (e.g., by a network-accessible platform or a computing device, such as a mobile phone, tablet, or personal computer), some embodiments enable additional levels of user input. For example, an editor may be able to reorder or discard certain segments, select different raw video clips, and use video editing tools to modify color, warping, stabilization, etc.
Also introduced herein are techniques for creating video composition templates that include interesting segments of video and/or timestamps, and then storing the video composition templates to delay the final composition of a video composition from a template until presentation. This enables the final composition of the video to be as personalized as possible using, for example, additional media streams that are selected based on metadata (e.g., sensor data) and viewer interests/characteristics.
Filming characteristics or parameters of the filming device can also be modified based on sensor-driven events. For example, sensor measurements may prompt changes to be made to the positioning, orientation, or movement pattern of the filming device. As another example, sensor measurements may cause the filming device to modify its filming technique (e.g., by changing the resolution, focal point, etc.). Accordingly, the filming device (or some other computing device) may continually or periodically monitor the sensor measurements to determine whether they exceed an upper threshold value, fall below a lower threshold value, or exceed a certain variation in a specified time period.
Brief definitions of terms, abbreviations, and phrases used throughout this disclosure are given below.
As used herein, the terms “connected,” “coupled,” or any variant thereof, means any connection or coupling, either direct or indirect, between two or more elements; the coupling of connection between the elements can be physical, logical, or a combination thereof. For example, two components may be coupled directly to one another or via one or more intermediary channels or components. Additionally, the words “herein,” “above,” “below,” and words of similar import shall refer to this application as a whole and not to any particular portions of this application.
Examples of the filming device 102 include, for example, an action camera, an unmanned aerial vehicle (UAV) copter, a mobile phone, tablet, or personal computer (e.g., desktop or laptop computer). Examples of the operator device 104 include a stand-alone or wearable remote control for controlling the filming device 102. Examples of the computing device 106 include, for example, a smartwatch (e.g., an Apple Watch or Pebble), an activity/fitness tracker (e.g., made by Fitbit, Garmin, or Jawbone), or a health tracker (e.g., a heart rate monitor).
Each of these devices can upload streams of data to the network-accessible platform 100, either directly or indirectly (e.g., via the filming device 102 or operator device 104, which may maintain a communication link with the network-accessible platform 100). The data streams can include video, audio, user-inputted remote controls, Global Positioning System (GPS) information (e.g., user speed, user path, or landmark-specific or location-specific information), inertial measurement unit (IMU) activity, flight state of filming device, voice commands, audio intensity, etc. For example, the filming device 102 may upload video and audio, while the computing device 106 may upload IMU activity and heart rate measurements. Consequently, the network-accessible platform 100 may receive parallel rich data streams from multiple sources simultaneously or sequentially.
The network-accessible platform 100 may also be communicatively coupled to an editing device 108 (e.g., a mobile phone, tablet, or personal computer) on which an editor views content recorded by the filming device 102, the operator device 104, and/or the computing device 106. The editor could be, for example, the same individual as the user of the filming device 102 (and, thus, the editing device 108 could be the same computing device as the filming device 102, the operator device 104 or the computing device 106). The network-accessible platform 100 is connected to one or more computer networks, which may include local area networks (LANs), wide area networks (WANs), metropolitan area networks (MANs), cellular networks, and/or the Internet.
Various system architectures could be used to build the network-accessible platform 100. Accordingly, the content may be viewable and editable by the editor using the editing device 108 through one or more of a web browser, software program, mobile application, and over-the-top (OTT) application. The network-accessible platform 100 may be executed by cloud computing services operated by, for example, Amazon Web Services (AWS) or a similar technology. Oftentimes, a host server 110 is responsible for supporting the network-accessible platform and generating interfaces (e.g., editing interfaces and compilation timelines) that can be used by the editor to produce media content (e.g., a video composition) using several different data streams as input. As further described below, some or all of the production/editing process may be automated by the network-accessible platform 100. For example, media content (e.g., a video) could be automatically produced by the network-accessible platform 100 based on events discovered within sensor data uploaded by the filming device 102, the operator device 104, and/or the computing device 106.
The host server 110 may be communicatively coupled (e.g., across a network) to one or more servers 112 (or other computing devices) that include media content and other assets (e.g., user information, computing device information, social media credentials). This information may be hosted on the host server 110, the server(s) 112, or distributed across both the host server 110 and the server(s) 112.
Production (stage 2) is the process of creating finished media content from combinations and reductions of parts of raw media. This can include the production of videos that range from professional-grade video clips to personal videos that will be uploaded to social media (e.g., Facebook or Instagram). Production (also referred to as the “media editing process”) is often performed in multiple stages (e.g., live production and post-production).
The finished media content can then be presented to one or more individuals and consumed (stage 3). For instance, the finished media content may be shared with individual(s) through one or more distribution channels, such as via social media, text messages, electronic mail (“e-mail”), or a web browser. Accordingly, in some embodiments the finished media content is converted into specific format(s) so that it is compatible with these distribution channel(s).
The editor would then typically identify interesting segments of media content by reviewing each clip of raw media content (step 2). Conventional media editing platforms typically require that the editor flag or identify interesting segments in some manner, and then pull the interesting segments together in a given order (step 3). Said another way, the editor can form a “story” by arranging and combining segments of raw media content in a particular manner. The editor may also delete certain segments of raw media content when creating the finalized media content.
In some instances, the editor may also perform one or more detailed editing techniques (step 4). Such techniques include trimming raw media segments, aligning multiple types of raw media (e.g., audio and video that have been separately recorded), applying transitions and other special effects, etc.
Introduced herein are systems and techniques for automatically producing media content (e.g., a video composition) using several inputs uploaded by one or more computing devices (e.g., filming device 102, operator device 104, and/or computing device 106 of
Video compositions (and other media content) can be created using different “composition recipes” that specify an appropriate style or mood and that allow video content to be timed to match audio content (e.g., music and sound effects). While the “composition recipes” allow videos to be automatically created (e.g., by network-accessible platform 100 of
As further described below, some embodiments also enable the “composition recipes” and “raw ingredients” (i.e., the content needed to complete the “composition recipes,” such as the timestamps, media segments, and raw input media) to be saved as a templated story that can be subsequently enhanced. For example, the templated story could be enabled at the time of presentation with social content (or other related content) that is appropriate for the consumer/viewer. Accordingly, sensor data streams could be used to dynamically improve acquisition, production, and presentation of (templated) media content.
Accordingly, the user can instead spend time reviewing edited media content (e.g., video compositions) created from automatically-identified segments of media content. In some instances, the user may also perform further editing of the edited media content. For example, the user may reorder or discard certain segments, or select different raw video clips. As another example, the user may decide to, use video editing tools to perform certain editing techniques and modify color, warping, stabilization, etc.
Raw logs of sensor information 506 can also be uploaded by the filming device, operator device, and/or another computing device. For example, an action camera or a mobile phone may upload video 508 that is synced with Global Positioning System (GPS) information. Other information can also be uploaded to, or retrieved by, a network-accessible platform, including user-inputted remote controls, GPS information (e.g., user speed, user path), inertial measurement unit (IMU) activity, voice commands, audio intensity, etc. Certain information may be only be requested by the network-accessible platform in some embodiments (e.g., flight state of the filming device when the filming device is a UAV copter). Audio 510, such as songs and sound effects, could also be retrieved by the network-accessible platform (e.g., from server(s) 112 of
The importance of each of these inputs can be ranked using one or more criteria. The criteria may be used to identify which input(s) should be used to automatically produce media content on behalf of the user. The criteria can include, for example, camera distance, user speed, camera speed, video stability, tracking accuracy, chronology, and deep learning.
More specifically, raw sensor data 506 uploaded to the network-accessible platform by the filming device, operator device, and/or other computing device can be used to automatically identify relevant segments of raw video 502 (step 512). Media content production and/or presentation may be based on sensor-driven or sensor-recognized events. Accordingly, the sensor(s) responsible for generating the raw sensor data 506 used to produce media content may not be housed within the filming device responsible for capturing the raw video 502. For example, interesting segments of raw video 502 can be identified based on large changes in acceleration as detected by an accelerometer or large changes in elevation as detected by a barometer. As noted above, the accelerometer and barometer may be connected to (or housed within) the filming device, operator device, and/or other computing device. One skilled in the art will recognize that while accelerometers and barometers have been used as examples, other sensors are can be (and often are) used. In some embodiments, the interesting segment(s) of raw video identified by the network-accessible platform are ranked using the criteria discussed above (step 514).
The network-accessible platform can then automatically create a video composition that includes at least some of the interesting segment(s) on behalf of the user of the filming device (step 516). For example, the video composition could be created by following different “composition recipes” that allow the style of the video composition to be tailored (e.g., to a certain mood or these) and timed to match certain music and other audio inputs (e.g., sound effects). After production of the video composition is completed, a media file (often a multimedia file) is output for further review and/or modification by the editor (step 518).
In some embodiments, one or more editors guide the production of the video composition by manually changing the “composition recipe” or selecting different audio files or video segments. Some embodiments also enable the editor(s) to take additional steps to modify the video composition (step 520). For example, the editor(s) may be able to reorder interesting segment(s), choose different raw video segments, and utilize video editing tools to modify color, warping, and stabilization.
After the editor(s) have finished making any desired modifications, the video composition is stabilized into its final form. In some embodiments, post-processing techniques are then used on the stabilized video composition, such as dewarping, color correction, etc. The final form of the video composition may be cut, recorded, and/or downscaled for easier sharing on social media (e.g., Facebook, Instagram, and YouTube) (step 522). For example, video compositions may naturally be downscaled to 720p based on a preference previously specified by the editor(s) or the owner/user of the filming device.
Additionally or alternatively, the network-accessible platform may be responsible for creating video composition templates that include interesting segments of the raw video 502 and/or timestamps, and then storing the video composition templates to delay the final composition of a video composition from a template until presentation. This enables the final composition of the video to be as personalized as possible using, for example, additional media streams that are selected based on metadata (e.g., sensor data) and viewer interests/characteristics (e.g., derived from social media).
As video compositions are produced, machine learning techniques can be implemented that allow the network-accessible platform to improve in its ability to acquire, produce, and/or present media content (step 524). For example, the network-accessible platform may analyze how different editors compare and rank interesting segment(s) (e.g., by determining why certain identified segments are not considered interesting, or by determining how certain non-identified segments that are considered interesting were missed) to help improve the algorithms used to identify and/or rank interesting segments of raw video using sensor data. Similarly, editor(s) can also reorder interesting segments of video compositions and remove undesired segments to better train the algorithms. Machine learning can be performed offline (e.g., where an editor compares multiple segments and indicates which one is most interesting) or online (e.g., where an editor manually recorders segments within a video composition and removes undesired clips). The results of both offline and online machine learning processes can be used to train a machine learning module executed by the network-accessible platform for ranking and/or composition ordering.
One skilled in the art will recognize that although the process 500 described herein is executed by a network-accessible platform, the same process could also be executed by another computing device, such as a mobile phone, tablet, or personal computer (e.g., laptop or desktop computer).
Moreover, unless contrary to physical possibility, it is envisioned that the steps described above may be performed in various sequences and combinations. For instance, an editor may accept or discard individual segments that are identified as interesting before the video composition is formed. Other steps could also be included in some embodiments.
In some embodiments, the video/image data uploaded by these computing devices is also synced (step 608). That is, the video/image/audio data uploaded by each source may be temporally aligned (e.g., along a timeline) so that interesting segments of media can be more intelligently cropped and mixed. Temporal alignment permits the identification of interesting segments of a media stream when matched with secondary sensor data streams. Temporal alignment (which may be accomplished by timestamps or tags) may also be utilized in the presentation-time composition of a story. For example, a computing device may compose a story by combining images or video from non-aligned times of a physical location (e.g., as defined by GPS coordinates). However, the computing device may also generate a story based on other videos or photos that are time-aligned, which may be of interest to, or related to, the viewer (e.g., a story that depicts what each member of a family might have been doing within a specific time window).
The remainder of the process 600 may be similar to process 500 of
The processor(s) 710 is/are the central processing unit (CPU) of the computing device 700 and thus controls the overall operation of the computing device 700. In certain embodiments, the processor(s) 710 accomplishes this by executing software or firmware stored in memory 720. The processor(s) 710 may be, or may include, one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), trusted platform modules (TPMs), or the like, or a combination of such devices.
The memory 720 is or includes the main memory of the computing device 700. The memory 720 represents any form of random access memory (RAM), read-only memory (ROM), flash memory, or the like, or a combination of such devices. In use, the memory 720 may contain a code 770 containing instructions according to the mesh connection system disclosed herein.
Also connected to the processor(s) 710 through the interconnect 730 are a network adapter 740 and a storage adapter 750. The network adapter 740 provides the computing device 700 with the ability to communicate with remote devices, over a network and may be, for example, an Ethernet adapter or Fibre Channel (FC) adapter. The network adapter 740 may also provide the computing device 700 with the ability to communicate with other computers. The storage adapter 750 allows the computing device 700 to access a persistent storage, and may be, for example, a Fibre Channel (FC) adapter or SCSI adapter.
The code 770 stored in memory 720 may be implemented as software and/or firmware to program the processor(s) 710 to carry out actions described above. In certain embodiments, such software or firmware may be initially provided to the computing device 700 by downloading it from a remote system through the computing device 700 (e.g., via network adapter 740).
The techniques introduced herein can be implemented by, for example, programmable circuitry (e.g., one or more microprocessors) programmed with software and/or firmware, or entirely in special-purpose hardwired circuitry, or in a combination of such forms. Special-purpose hardwired circuitry may be in the form of, for example, one or more application-specific integrated circuits (ASICs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), etc.
Software or firmware for use in implementing the techniques introduced here may be stored on a machine-readable storage medium and may be executed by one or more general-purpose or special-purpose programmable microprocessors. A “machine-readable storage medium”, as the term is used herein, includes any mechanism that can store information in a form accessible by a machine (a machine may be, for example, a computer, network device, cellular phone, personal digital assistant (PDA), manufacturing tool, any device with one or more processors, etc.). For example, a machine-accessible storage medium includes recordable/non-recordable media (e.g., read-only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; etc.), etc.
The term “logic”, as used herein, can include, for example, programmable circuitry programmed with specific software and/or firmware, special-purpose hardwired circuitry, or a combination thereof.
Reference in this specification to “various embodiments” or “some embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Alternative embodiments (e.g., referenced as “other embodiments”) are not mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not other embodiments.
This application claims priority to U.S. Provisional Patent Application No. 62/416,600, filed Nov. 2, 2016, the entire contents of which are herein incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
62416600 | Nov 2016 | US |