This patent document relates to automated post-production editing of user-generated multimedia contents including audios, videos, or multimedia products that include moving images.
User-generated content, also known as user-created content, is content that has been posted by users on online platforms. The advent of user-generated content marks a shift from creating online content by media organizations to providing facilities for amateurs to publish their own content. With the rapid development of mobile devices that are capable of capturing content at a variety of time and places and various social media platforms, the amount of user-generated content has increased at a staggering pace.
Described herein are techniques, subsystems and systems to allow automated post-production editing of user-generated content, thereby enabling amateur users to easily create professionally edited multiple media contents and to distribute the contents among multiple social media platforms. The disclosed techniques can be used by amateurs to automatically organize captured footages from multiple devices according to a timeline of an event and to produce professionally edited content without the need to understand complex editing commands.
In one example aspect, the disclosed technology can be implemented to provide a computer-implemented method for performing post-production editing includes receiving one or more footages of an event from at least one user. The method includes constructing, based on information about the event, a script to indicate a structure of multiple temporal units of the one or more footages, and extracting semantic meaning from the one or more footages based on a multimodal analysis comprising an audio analysis and a video analysis. The method also includes adding editing instructions to the script based on the structure of the multiple temporal units and the semantic meaning extracted from the one or more footages and performing editing operations based on the editing instructions to generate an edited multimedia content based on the one or more footages.
In another example aspect, the disclosed technology can be implemented to provide a post-production editing platform includes a user interface configured to receive one or more footages of an event from at least one user. The platform also includes one or more processors configured to construct, based on information about the event, a script to indicate a structure of multiple temporal units of the one or more footages. The one or more processors are configured to extract semantic meaning from the one or more footages based on at least an audio analysis and a video analysis of the one or more footages, add editing instructions to the script based on the structure of the multiple temporal units and the semantic meaning extracted from the one or more footages, and perform editing operations based on the editing instructions to generate an edited multimedia content based on the one or more footages.
These, and other, aspects are described in the present document.
Rapid development of mobile devices and social media platforms has led to a staggering amount of user-generated contents such as videos and other multimedia materials. Yet, the vast majority of the user-generated contents tends to be poorly edited. For example, many amateur video materials may be edited with only a handful of editing effects and significant improvements may be made by additional editing and enhancements. Unlike professionally produced video materials and multimedia contents, amateur user-generated contents often do not come with a carefully prepared production script or a clear storyline. Often times, individuals capture events from different angles spontaneously, resulting in digital video footages that are neither synchronized nor aligned with one another in some aspect. Various available video editing software for amateurs can be limited in terms of editing functions and performance. Professional video editing software programs are pricey and are complex to use. Therefore, post-production editing of user-generated contents from multiple sources continues to be a challenge for amateur users for producing good quality armature videos and multimedia materials.
This patent document discloses techniques that can be implemented in various embodiments to allow fully automated post-production editing of user-generated contents, thereby enabling amateur users to create high quality multiple media contents with ease and with a feel of a professionally edited video. The disclosed techniques can be implemented to provide interactive and iterative editing of the contents using simple user interface controls to achieve the editing effects that are desired by the users.
In some embodiments, the disclosed techniques can be implemented as a post-production editing platform that includes one or more of the following subsystems:
1. User Interface: The post-production editing platform provides a user interface that allows users to upload footages captured using one or more devices. Such an user interface may be structured to enable users to provide some basic information about the captured subject matter, such as the type of the event, the number of devices used to capture the data, and the time and/or location of the event. Such user provided basic information can be subsequently used to facilitate the creation of the desired user edited multimedia contents. The user interface can also be configured to enable users to select a desired editing template based on the nature of the event from different editing templates tailored for different types of events. For example, for a wedding event, the platform can provide several post-production editing templates specifically designed for weddings for the users to choose. Alternatively, or in addition, the platform can select a default template to use based on the information provided by the user.
2. Content Reconstruction: Based on the information provided by the user, the Content Reconstruction part of the post-production editing platform performs preliminary content analysis on the footages to determine the scenes and/or shots structure of the footages according to the timeline.
3. Semantic Analysis: After determining the scene and/or shot structure of the footages, the Semantic Analysis part of the platform can further apply semantic analysis to the footages to obtain details of each scene/shot. For example, audio data can be converted to closed caption of the conversations; facial recognition can be performed to identify main roles that appear in the footages. Based on the scene/shot structure and the results of semantic analysis, the platform can construct a script that outlines the storyline, timeline, roles, and devices involved for capturing the raw data.
4. Automated Post-production Editing: Once the script is constructed, post-production editing can be performed fully automatically by the Automated Post-production Editing Module of the platform. For example, based on the template selected by the user, the Automated Post-production Editing module of the platform can modify the generated script to add appropriate editing instructions. Certain scenes and/or shots can be cut while certain artistic effects can be added as transitions between the scenes.
5. Interactive Refinement: The generated script also provides the flexibility of interactive refinement when the user would like to make custom editing changes to the content that are different from what has been defined in the template. The platform can provide an Interactive Refinement module with simple, or intuitive user interface controls to enable the user to modify the editing effects.
6. Packaging and Release: The edited content can be packaged to appropriate format(s) based on the target social media platforms and distributed accordingly.
The post-production editing platform can be implemented as a stand-alone software program or a web service. Details of the above subsystems are further discussed in connection with
In some embodiments, the platform can perform a quick facial recognition on part of the footages to identify the main characters involved in the event. For example, if the event involves several main characters (e.g., the bride and the groom in a wedding), the platform can analyze part of the footages to identify the bride and the groom. One way to implement this identification is to provide a user interface that enables the user upload photos of the main characters (e.g., the bride and the groom) to allow the platform to apply facial recognition using the faces in the uploaded photos to correctly identify the characters in the videos. In some embodiments, after the platform identifies several main characters, the user can be prompted to provide or input the name of these identified characters.
In some embodiments, the platform can determine an appropriate template for the project based on the information provided by the user. The template can provide a default storyline, along with a set of background music clips and/or artistic effects. In some embodiments, the user can select a template from a set of available templates. The user can also make changes to the template (e.g., replacing music clips or editing effects) either before any of the processing is performed on the footages or after the footages are edited. For an event that does not have a well-defined structure, there may not be any available template. The user can be prompted to provide a structure. For example, the user can provide descriptions for a list of scenes based on time sequence and different locations of the event.
Once the project is created, the user can also invite other users to participate the project, e.g., inviting friends or family members to the project so that the invited friends or family members can upload additional content captured from different devices. The platform can determine the number of devices used to produce the contents based on the number of user uploads and/or the metadata associated with the footages.
As part of the content reconstruction, the platform then performs video segmentation to divide the footages into smaller segments in the temporal unit of shots and/or scenes. A shot is a sequence of frames shot uninterruptedly by one camera. Multiple shots that are produced at the same location and/or time are grouped into a scene. The platform can perform shot transition detection to determine any abrupt or gradual transitions in the content and split the footages into shots. The platform can further adopt different algorithms, such as content-aware detection and/or threshold detection, to determine whether a scene change has occurred so as to group relevant shots in the same scene. A tree-structure that includes multiple scenes, each scene including multiple shots, can be constructed to represent the footages.
In some embodiments, the time information provided by the users can be inaccurate. Also, the time information included in the metadata may not match perfectly as the devices were not synchronized. The platform can perform preliminary object/character/gesture recognition to align the shots based on the content of the shots (e.g., when the same character or the same gesture appeared in two different video clips). Furthermore, audio data can be used to align the shots in time domain. When the same sound appears at slightly different time points in different clips, the platform can synchronize the clips and/or shots based on the occurrence of the sound.
The platform can start to build a script based on the preliminary information and time-domain alignment/synchronization. Table 1 shows an example initial script constructed by the post-production editing system corresponding to the structure shown in
As most amateur productions do not have predefined storylines or production scripts, the users lack a clear outline to organize the contents for editing purposes. The script generated by the post-production editing platform offers the users a top-level overview of the contents and the relationships between contents captured by different devices, thereby facilitating subsequent editing operations to be performed on the contents.
For example, audio and text analysis using NLP algorithms can be adopted to classify speech and extract key words. The audio data can be converted into closed caption using voice recognition techniques. Audio analysis can also extract non-verbal information such as applauding, cheering, and/or background music or sound.
In some embodiments, besides the preliminary facial recognition and/or object detection operations, computer vision technologies can be used to identify actions and motions accurately. For example, techniques such as optical flow can be used to track human action and/or object movements. Based on the information provided by the user (e.g., the nature of the events, the location at which the footages were captured, etc.) and the recognized objects/characters, sequential actions that have been identified can be linked to form a semantic context. The shots and/or scenes associated with the actions can then be provided with corresponding semantic labels. For example, given a well-defined scene, such as the vow exchange at a wedding, the actions performed by the characters can be labeled with corresponding semantic meanings with high confidence. For scenes that do not have well-defined structures and/or semantic contexts, the system can indicate that the derived semantic meaning is given a low confidence level. The user can be prompted to refine or improve the semantic labeling of the actions/scenes for those scenes.
In some embodiments, one or more neural networks can be trained to provide more accurate context labeling for scenes/shots. Different domain-specific networks can be used for scenes that are well-defined (e.g., weddings, performances, etc.) as well as scenes that lack well-defined structures (e.g., family picnic). In particular, a recurrent neural network (RNN) is a class of artificial neural networks that form a directed graph along a temporal sequence. In some embodiments, a domain-specific RNN (e.g., for wedding events) can be trained to provide semantic meaning for certain shots/scenes in wedding footages. Another domain-specific RNN (e.g., for picnics) can be trained to label certain shots/scenes in footages that capture family picnics. The RNNs can first be trained offline using a small set of training data with predefined correspondence between actions (e.g., an applause following a speech, a laughter after a joke). Online training can further be performed on the RNNs based on feedback from the user. For example, once the system derives a semantic meaning with a low confidence level, the user can be prompted to provide correction and/or refinement of the semantic meaning. The user input can be used to further train the model to achieve higher accuracy for subsequent processing.
The results of the semantic analysis can be summarized to supplement the initial script generated by the platform. Table 2 shows an example script with semantic information in accordance with the present technology. Additions and/or updates to the initial script based on semantic analysis results are underlined.
In some embodiments, based on the type of the event, the template can pre-define one or more scenes with corresponding semantic meanings that can be matched to the captured content. For example, for a wedding event, the template can define a default scene for the speech of the groom's father. The scene can come with predefined semantic information. After performing the audio and video analysis, the platform can match the shots and/or clips to the predefined speech scene and update the script accordingly.
In some embodiments, the script can be further modified to include the editing operations to be performed to the footages. For example, shots can be cut for each scene; multiple clips from different devices can be stitched. In addition to the cutting/editing locations determined based on the template, the post-production editing platform can determine whether there are dramatic changes in the footage indicating “dramatic moments,” which can be potential cut positioning to further cut/edit the footage.
In some embodiments, the lengths of the scenes can be adjusted according to the desired length of the entire content. The original background music or sound can be replaced by different sound effects. Transition effects between the scenes can also be added to the script. Table 3 shows an example script with editing operations in accordance with the present technology. The example changes to the script and editing operations are underlined in Table 3. Based on information in the script, the platform performs editing of the footages accordingly.
In some embodiments, the platform can implement a cloud-based film editing system (CFES) to perform a range of editing operations in a fully automated multimedia editing platform to enable automatic editing according to a storyline that is represented as a machine-readable script. Such a CFES system can be implemented in various configurations where computer servers or parts or components of the CFES system may be geographically or physically located at different regions or locations to enable users of the CFES system to send captured videos to the CFES system for editing at any user location where there is an internet access to the CFES system and retrieve the CEFS-edited video.
As a specific example,
The MPUS 901 provides a user interface that guides users to work through the pre-production process. Based on the genre or visual style of the film, the MPUS 901 can generate machine-readable scripts for the scenes and determine a preliminary production schedule for the user. The MPDS 905 serves as a role of the director in an automated film production. The scripts generated by the MPUS 901 are loaded into the MPDS 905 for further processing. Based on the geographical locations of the scenes/shots, the required equipment, and the personnel involved, the MPDS 905 can determine dependencies and/or constraints among various scenes. During the production time, the MPDS 905 can accurately determine the start time and the duration of each shot and each scene and make adjustment accordingly. The EDMS 903 is a proxy server which receives instructions from MPDS 905 and relay the instructions to all end devices and personnel during the film shooting. The EDMS 903 can be used to provide device registration, device control, device synchronization, device tracking, and encoding support for the production of the content. The CFES 907 carries out most of post-production activities in an automated way; it can either operate on all multimedia contents after the film shooting is completed or operate in real-time on multimedia contents streamed from end devices while content being captured at the scene. In various implementations, the CFES can be designed to provide film editing, audio editing, multimedia quality enhancement and commercial insertion.
The movie production system 900 can be offered to a user as a complete system for production of a movie or TV show; while in other implementations, one or more of the sub-systems in the system 900 can be accessed by a user to facilitate part of a particular production of a movie or a TV show. For example, the CFES 907 implemented using the disclosed techniques can be an integrated or an independent post-production editing system available to users. The CFES 907 includes one or more processors and one or more memories including processor executable code. The processor executable code, upon execution by the one or more processors, is operable to configure the one or more processors to receive one or more machine-readable scripts corresponding to one or more scenes of a storyline. The one or more machine-readable scripts include information about multimodal data and editing instructions for each of the one or more scenes. The one or more processors are configured to receive multiple streams of multimedia content corresponding to the one or more scenes, identify at least one change in an audio or video feature in the multiple streams of multimedia content based on the multimodal data for each of the one or more scenes, edit the multiple streams of multimedia content based on the editing instructions and selectively based on the identified change, and generate a final stream of multimedia content based on the edited multiple streams. Details regarding the CFES are further described in International Application No. PCT/US2020/032217, entitled “Fully Automated Post-Production Editing for Movies, TV Shows, and Multimedia Contents,” filed on May 8, 2020, which is incorporated by reference by its entirety.
In some embodiments, prior to the content being distributed to various social media platforms, the user may desire to make additional changes to the editing effects. At this stage, the user can be presented with the complete script, which includes the editing instructions as well as the structure of the content. The script also shows how different clips/shots are interrelated to form the edited content. The user now has the option to use simple user interface controls (e.g., selections between different transition types, selections between different angles of the footages) to modify the editing effects without the need to possess professional knowledge about video editing or software programs. The platform can provide a revised version of the edited content based on control input so that the editing operations can be performed in an interactive and iterative manner. In some embodiments, instead of using the provided user interface controls, the user can manually edit the script to incorporate the desired editing effects. The system updates the edited content according to the changes in the script to provide timely feedback to the user.
Once the footages are edited, the edited content can be packaged and distributed to a target platform.
The processor(s) 705 may include central processing units (CPUs) to control the overall operation of, for example, the host computer. In certain embodiments, the processor(s) 705 accomplish this by executing software or firmware stored in memory 710. The processor(s) 705 may be, or may include, one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), or the like, or a combination of such devices.
The memory 710 can be or include the main memory of the computer system. The memory 610 represents any suitable form of random access memory (RAM), read-only memory (ROM), flash memory, or the like, or a combination of such devices. In use, the memory 710 may contain, among other things, a set of machine instructions which, when executed by processor 705, causes the processor 705 to perform operations to implement embodiments of the presently disclosed technology.
Also connected to the processor(s) 705 through the interconnect 725 is a (optional) network adapter 715. The network adapter 715 provides the computer system 700 with the ability to communicate with remote devices, such as the storage clients, and/or other storage servers, and may be, for example, an Ethernet adapter or Fiber Channel adapter.
In some embodiments, the method includes presenting, to a user via a user interface, the script and the edited multimedia content; receiving input from the user via the user interface to update at least part of the script; and generating a revised version of the edited multimedia content based on the updated script in an iterative manner.
In some embodiments, the method includes extracting information about time or location at which the event has been captured based on metadata embedded in the one or more footages. In some embodiments, the structure of the multiple temporal units specifies that a scene includes multiple shots, and one or more clips from at least one device correspond to a same shot. In some embodiments, the method includes assigning a time domain location for each of the multiple temporal units of the one or more footages and aligning corresponding temporal units based on the time domain location. In some embodiments, the method also includes identifying one or more characters or one or more gestures in the one or more footages and refining the aligning of the corresponding temporal units based on the identified one or more characters or the identified one or more gestures.
In some embodiments, the method includes extracting text or background sound from the one or more footages based on the audio analysis and modifying the script to include the extracted text or the background sound. In some embodiments, the method includes replacing the background sound using an alternative sound determined based on the semantic meaning of the one or more footages.
In some embodiments, the semantic meaning comprises an association between some of the one or more characters that is determined based on the video analysis of the one or more footages. In some embodiments, the method includes packaging the edited multimedia content based on a target online media platform and distributing the packaged multimedia content to the target online media platform.
In operation, the post-production editing system 1000 may be connected as part of a multimedia content system or be accessed by users for performing desired post-production editing operations. Such a multimedia content system can include an input device that comprises at least a camera (e.g., 1002a and/or 1002b) as shown in
The above examples demonstrate that the techniques and systems disclosed in this patent document can be adopted widely to produce professionally edited multimedia contents based on user-captured content using multiple devices. Instead of performing a one-stop automated editing operation, the disclosed system aims to reconstruct a professional production structure (e.g., a reconstructed production script) from raw UGC contents so as to enable content editing at the professional level. The reconstructed script allows the users to quickly understand the correspondence between the shot/scene, the editing effects, and different media files, thereby enabling the users to iteratively make appropriate editing choices if so desired.
Implementations of the subject matter and the functional operations described in this patent document can be implemented in various systems, digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer program products, e.g., one or more modules of computer program instructions encoded on a tangible and non-transitory computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term “data processing unit” or “data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
A computer program (also known as a program, software, software application, machine-readable script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.
Only a few implementations and examples are described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.
This patent application is a continuation application of and claims benefits of and priorities to International Patent Application No. PCT/US2021/056839 filed Oct. 27, 2021, which is a continuation-in-part application of U.S. patent application Ser. No. 17/082,941 of the same title by the same inventors, filed on Oct. 28, 2020. The entire content of the before-mentioned patent applications is incorporated by reference as part of the disclosure of this application.
Number | Name | Date | Kind |
---|---|---|---|
6466655 | Clark | Oct 2002 | B1 |
8244104 | Kashiwa | Aug 2012 | B2 |
8560951 | Snyder | Oct 2013 | B1 |
8910201 | Zamiska et al. | Dec 2014 | B1 |
9106812 | Price et al. | Aug 2015 | B1 |
9998722 | Swearingen et al. | Jun 2018 | B2 |
10057537 | MacDonald-King et al. | Aug 2018 | B1 |
10721377 | Wu et al. | Jul 2020 | B1 |
11070888 | Wu et al. | Jul 2021 | B1 |
11107503 | Wu et al. | Aug 2021 | B2 |
11166086 | Wu et al. | Nov 2021 | B1 |
11315602 | Wu et al. | Apr 2022 | B2 |
11321639 | Wu et al. | May 2022 | B1 |
11330154 | Wu et al. | May 2022 | B1 |
11564014 | Wu et al. | Jan 2023 | B2 |
11570525 | Wu et al. | Jan 2023 | B2 |
20020099577 | Black | Jul 2002 | A1 |
20030061610 | Errico | Mar 2003 | A1 |
20030225641 | Gritzmacher et al. | Dec 2003 | A1 |
20060053041 | Sakai | Mar 2006 | A1 |
20060251382 | Vronay et al. | Nov 2006 | A1 |
20060282783 | Covell et al. | Dec 2006 | A1 |
20060282785 | McCarthy et al. | Dec 2006 | A1 |
20070099684 | Butterworth | May 2007 | A1 |
20080010601 | Dachs | Jan 2008 | A1 |
20080028318 | Shikuma | Jan 2008 | A1 |
20080033801 | McKenna et al. | Feb 2008 | A1 |
20080036917 | Pascarella et al. | Feb 2008 | A1 |
20080101476 | Tian et al. | May 2008 | A1 |
20090063659 | Kazerouni et al. | Mar 2009 | A1 |
20090279840 | Kudo et al. | Nov 2009 | A1 |
20110085025 | Pace et al. | Apr 2011 | A1 |
20110162002 | Jones et al. | Jun 2011 | A1 |
20110206351 | Givoly | Aug 2011 | A1 |
20110249953 | Suri et al. | Oct 2011 | A1 |
20120294589 | Samra et al. | Nov 2012 | A1 |
20130067333 | Brenneman | Mar 2013 | A1 |
20130124984 | Kuspa | May 2013 | A1 |
20130151970 | Achour | Jun 2013 | A1 |
20130166625 | Swaminathan et al. | Jun 2013 | A1 |
20130167168 | Ellis et al. | Jun 2013 | A1 |
20130177294 | Kennberg | Jul 2013 | A1 |
20130204664 | Romagnolo et al. | Aug 2013 | A1 |
20130232178 | Katsambas | Sep 2013 | A1 |
20130290557 | Baratz | Oct 2013 | A1 |
20140082079 | Dunsmuir | Mar 2014 | A1 |
20140119428 | Catchpole et al. | May 2014 | A1 |
20140132841 | Beaulieu-Jones et al. | May 2014 | A1 |
20140133834 | Shannon | May 2014 | A1 |
20140242560 | Movellan et al. | Aug 2014 | A1 |
20140328570 | Cheng | Nov 2014 | A1 |
20150012325 | Maher | Jan 2015 | A1 |
20150043892 | Groman | Feb 2015 | A1 |
20150082349 | Ishtiaq et al. | Mar 2015 | A1 |
20150256858 | Xue | Sep 2015 | A1 |
20150261403 | Greenberg et al. | Sep 2015 | A1 |
20150281710 | Sievert et al. | Oct 2015 | A1 |
20150302893 | Shannon | Oct 2015 | A1 |
20150363718 | Boss et al. | Dec 2015 | A1 |
20150379358 | Renkis | Dec 2015 | A1 |
20160027198 | Terry et al. | Jan 2016 | A1 |
20160050465 | Zaheer et al. | Feb 2016 | A1 |
20160071544 | Waterston et al. | Mar 2016 | A1 |
20160132546 | Keating | May 2016 | A1 |
20160292509 | Kaps et al. | Oct 2016 | A1 |
20160323483 | Brown | Nov 2016 | A1 |
20160350609 | Mason et al. | Dec 2016 | A1 |
20160360298 | Chalmers et al. | Dec 2016 | A1 |
20170017644 | Accardo et al. | Jan 2017 | A1 |
20170048492 | Buford | Feb 2017 | A1 |
20170169853 | Hu et al. | Jun 2017 | A1 |
20170178346 | Ferro et al. | Jun 2017 | A1 |
20170337912 | Caligor et al. | Nov 2017 | A1 |
20170358023 | Peterson | Dec 2017 | A1 |
20180005037 | Smith et al. | Jan 2018 | A1 |
20180213289 | Lee et al. | Jul 2018 | A1 |
20190045194 | Zavesky | Feb 2019 | A1 |
20190058845 | MacDonald-King et al. | Feb 2019 | A1 |
20190075148 | Nielsen | Mar 2019 | A1 |
20190107927 | Schriber et al. | Apr 2019 | A1 |
20190155829 | Schriber et al. | May 2019 | A1 |
20190215421 | Parthasarathi et al. | Jul 2019 | A1 |
20190215540 | Nicol et al. | Jul 2019 | A1 |
20190230387 | Gersten | Jul 2019 | A1 |
20190244639 | Benedetto | Aug 2019 | A1 |
20190354763 | Stojancic | Nov 2019 | A1 |
20190356948 | Stojancic | Nov 2019 | A1 |
20200065612 | Xu et al. | Feb 2020 | A1 |
20200081596 | Greenberg et al. | Mar 2020 | A1 |
20200168186 | Yamamoto | May 2020 | A1 |
20200213644 | Gupta et al. | Jul 2020 | A1 |
20200312368 | Waterman | Oct 2020 | A1 |
20200327190 | Agrawal et al. | Oct 2020 | A1 |
20200364668 | Altunkaynak | Nov 2020 | A1 |
20200396357 | Wu et al. | Dec 2020 | A1 |
20210011960 | Chambon-Cartier | Jan 2021 | A1 |
20210084085 | Jones et al. | Mar 2021 | A1 |
20210104260 | Wu et al. | Apr 2021 | A1 |
20210152619 | Bercovich | May 2021 | A1 |
20210185222 | Zavesky et al. | Jun 2021 | A1 |
20210211779 | Wu et al. | Jul 2021 | A1 |
20210264161 | Saraee et al. | Aug 2021 | A1 |
20210350829 | Wu et al. | Nov 2021 | A1 |
20210398565 | Wu et al. | Dec 2021 | A1 |
20220070540 | Wu et al. | Mar 2022 | A1 |
20220254378 | Wu et al. | Aug 2022 | A1 |
20230041641 | Wu et al. | Feb 2023 | A1 |
Number | Date | Country |
---|---|---|
3038767 | Oct 2019 | CA |
101316362 | Dec 2008 | CN |
101365094 | Feb 2009 | CN |
101960440 | Jan 2011 | CN |
104581222 | Apr 2015 | CN |
108447129 | Aug 2018 | CN |
109196371 | Jan 2019 | CN |
109783659 | May 2019 | CN |
109905732 | Jun 2019 | CN |
2000101647 | Apr 2000 | JP |
2004105035 | Dec 2004 | WO |
2008156558 | Dec 2008 | WO |
2010068175 | Jun 2010 | WO |
2011004381 | Jan 2011 | WO |
2014090730 | Jun 2014 | WO |
2021074721 | Apr 2021 | WO |
Entry |
---|
International Search Report and Written Opinion dated Mar. 10, 2020 in International Application No. PCT/CN2019/090722, 10 pages. |
Davenport, Glorianna, et al., “Cinematic primitives for multimedia”, MIT Media Laboratory, IEEE Computer graphics and Applications, pp. 67-74, Jul. 1991. |
International Search Report and Written Opinion dated May 7, 2020 for International Application No. PCT/CN2019/099534, filed on Aug. 7, 2019 (9 pages). |
International Search Report and Written Opinion dated May 27, 2020 for International Application No. PCT/CN2019/109919, filed on Oct. 8, 2019 (11 pages). |
International Search Report and Written Opinion dated Aug. 7, 2020 for International Application No. PCT/US2020/032217, filed on May 8, 2020 (10 pages). |
International Search Report and Written Opinion dated Jan. 3, 2022 for International Application No. PCT/US2021/047407, filed on Aug. 24, 2021 (20 pages). |
P. Minardi and B. Alonso, “How Automation Can Help Broadcasters and Production Companies Reach Video Production Nirvana,” SMPTE17: Embracing Connective Media, 2015, pp. 1-12, doi: 10.5594/M001738. (Year: 2015). |
International Search Report and Written Opinion dated Feb. 28, 2022 for International Application No. PCT/US2021/056839, filed on Oct. 27, 2021 (16 pages). |
Hua et al., “AVE—Automated Home Video Editing,” Proceedings of the 11th ACM International Conference on Multimedia, MM '03, Berkeley, CA, Nov. 2-8, 2003. |
Tunikova, Oksana, “Product Placement—A Good Advertising Adaptation?,” Business 2 Community, available at https://www.business2community.com/marketing/product-placement-good-advertising-adaptation-02026643. |
Extended European Search Report for European Patent Application No. 19932602.6, dated Nov. 25, 2022 (8 pages). |
Office Action for Chinese Patent Application No. 201980098650.5, dated Nov. 10, 2022 (15 pages). |
International Search Report and Written Opinion dated Apr. 21, 2023 for International Application No. PCT/US2022/081244 (23 pages). |
Number | Date | Country | |
---|---|---|---|
20220132223 A1 | Apr 2022 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2021/056839 | Oct 2021 | US |
Child | 17517520 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 17082941 | Oct 2020 | US |
Child | PCT/US2021/056839 | US |