Currently, many media editing applications for creating media presentations exist that composite several pieces of media content such as video, audio, animation, still image, etc. Such applications give graphical designers, media artists, and other users the ability to edit, combine, transition, overlay, and piece together different media content in a variety of manners to create a resulting composite presentation. Examples of media editing applications include Final Cut Pro® and iMovie®, both sold by Apple® Inc.
The media editing applications include a graphical user interface (“GUI”) that provides different tools for creating and manipulating media content. These tools include different controls for creating a movie by selecting source video clips from a library and adding background music. The tools allow addition of titles, transitions, photos, etc., to further enhance the movie.
The tools further allow manual selection and addition of sound effects to different visual elements or graphical cues such as titles and transitions. Often, in professionally produced multimedia content, sound designers craft sound effects (or audio effects) to augment graphical cues. For instance, the sounds that accompany titles in broadcast sports or news are typically created, chosen, placed, timed, and leveled manually by someone trained in sound design to make a specific storytelling or creative point without the use of dialogue or music.
However, these visual elements can have different lengths and the sound effects can have different clips for coming-in and going-out sounds. Other video clips and visual elements can also start shortly after any visual element. Therefore, the sound effects added for each visual element require a lot of efforts to be manually trimmed, faded-in, faded-out, and spotted by an expert to the right place on the clip. In addition, different movies can have different volume levels and the volume level of the sound effects has to be manually adjusted to properly blend with the audio for the rest of the movie.
Some embodiments provide an automated method to add custom sound effects to graphical elements added to a sequence of images such as a sequence of video clips in a movie. The method analyzes the properties of graphical elements such as titles, transitions, and visual effects added to the video sequence. The properties include the type, style, duration, fade-in, fade-out, and other properties of the graphical elements.
The method then automatically and without human intervention selects one or more sound effects clips from a library for each graphical element. The method then trims each sound effects clip to fit the required duration of the graphical element. The method also fades the edges of the audio clip to ensure smoothness.
The method further analyzes the surrounding content and adjusts the volume of each sound clip to an appropriate level based on adjacent content. The method then schedules the sound effects clips along with other clips in the sequence to pay during playback or monitoring.
The uses of the disclosed method include, but are not limited to, applying sounds to transitions between media clips, applying sounds in conjunction with animated titles, adding sounds to visual effects or filters, etc. The method is also utilized by any animation engine that requires sound effects to be added to animated visual elements.
The preceding Summary is intended to serve as a brief introduction to some embodiments of the invention. It is not meant to be an introduction or overview of all inventive subject matter disclosed in this document. The Detailed Description that follows and the Drawings that are referred to in the Detailed Description will further describe the embodiments described in the Summary as well as other embodiments. Accordingly, to understand all the embodiments described by this document, a full review of the Summary, Detailed Description and the Drawings is needed. Moreover, the claimed subject matters are not to be limited by the illustrative details in the Summary, Detailed Description and the Drawings, but rather are to be defined by the appended claims, because the claimed subject matters can be embodied in other specific forms without departing from the spirit of the subject matters.
The novel features of the invention are set forth in the appended claims. However, for purpose of explanation, several embodiments of the invention are set forth in the following figures.
In the following detailed description of the invention, numerous details, examples, and embodiments of the invention are set forth and described. However, it will be clear and apparent to one skilled in the art that the invention is not limited to the embodiments set forth and that the invention may be practiced without some of the specific details and examples discussed.
Some embodiments provide a method for automatically adding sound to animated graphical elements (also referred to herein as visual elements or visual cues) such as title of a video clip or transitions between the video clips. The method analyzes metadata and audio from the video clip. The method then automatically adds sound effects to the graphical elements. The method also retimes and trims the sound effects to fit. The method also analyzes the surrounding content and adjusts the volume and fades the sound based on the analysis of the surrounding content.
The video clips in a movie can be monitored by activating a control (such as hitting the space bar key) to start playing the movie. The movie is played in the monitoring area 135. When the movie is stopped, the frame immediately below the play head 140 is displayed in the monitoring area 135.
In stage 102, a control 145 is activated to show a set 150 of visual elements such as titles, transitions, visual filters, etc., to add to the movie. In this example, the added visual element 155 is a transition and is added (as conceptually shown by a finger 180) between the two video clips 120 and 125. In stage 103, automatically and without any further inputs from a human, sound effects are added to the movie for the visual element. The sound effects are added based on the properties of the visual elements such as the type (e.g., transition or title), style (e.g., cut, swipe, spin), the number of events in the visual element (e.g., coming in, zooming in, zooming out, fading out), duration, etc.
Addition of the sound effects clip 170 is conceptually shown in the exploded area 175. The sound clips 160 and 165 corresponding to video clips 120 and 125 are also shown in the exploded area 175.
Several more detailed embodiments of the invention are described in sections below. Section I describes automatically adding custom sound effects for graphical elements in some embodiments. Next, Section II describes the software architecture of some embodiments. Finally, a description of an electronic system with which some embodiments of the invention are implemented is provided in Section III.
In some embodiments, a video editing project starts by creating a project and adding video clips to the project. A movie can be created by adding one or more video clips to a project.
At any time a theme can be selected and applied to the video project. Each theme provides a specific look and feel for the edited video. Examples of themes include modern, bright, playful, neon, travel, simple, news, CNN iReport, sports, bulletin board, and photo album. In the example of
In
In addition, the figure shows that a theme related transition 330 is added between the two video clips 305 and 310. In this example, the play head 335 is at the beginning of the transition and a transition related to neon theme is played in the preview display area 340.
In the example of
Once a graphical element such as a title or transition is added to a video project, sound effects can be added to the graphical element to further enhance the video project. In the past a person using a media editing application had to manually select an audio file (e.g., from a library or by importing an audio file) and spot the audio file to the movie. In addition, the audio file had to be manually retimed, trimmed, faded in and out to properly fit in the video clip sequence of the movie. The volume of the audio file also had to be adjusted to blend with volume of the surrounding content in the video sequence.
Some embodiments provide an automatic method of selecting, adding, and adjusting sound effects to graphical elements of a video sequence without requiring user intervention.
The process then analyzes (at 510) the properties of the added element. The process determines different properties of the graphical element such as type (e.g., transition or title), style (e.g., spin out transition, sport theme transition, news theme title, filmstrip theme title, etc.), and duration. The process also determines whether there is fade-in and fade-out and what are the durations of the fade-in and fade-out. In some embodiments, the graphical elements include metadata that describes different properties of the element. In these embodiments, process 500 also utilizes the metadata to determine the properties of a graphical element.
The process then chooses (at 515) a set of sound clips to apply to the graphical element based of the analysis of the properties of the element. Some embodiments, maintain one or more sound effects for each graphical element that is provided by the media editing application. For instance, for each title in the list 235 of titles shown in
In some of these embodiments, process 500 performs a table look up to identify a set of sound effects to apply to a graphical element. Depending on the type, duration, and other properties of the graphical element, the selected set of sound clips can have one or more sound clips. For instance, the set of selected sound effects for a particular title might have one sound clip for the coming-in period and one sound clip for the going out period while the selected set of sound clips for another title might only have one sound clip.
Referring back to
The process then retimes and trims (at 520) the sound clip to fit. For instance, the duration of the sound effects clip associated with a graphical element can be longer than the duration of the graphical element in order to allow the “tail” portion of the sound effects clip (e.g., the portion after the dashed line 655 shown in
Process 500 then analyzes (at 525) the surrounding content. The process then adjusts (at 530) the volume of the added sound effects based on the analysis of the surrounding content.
As shown in
A set of sound effects can have one or more audio clips. For instance, the set of sound effects selected for transition video clip 635 in
Examples of available titles are standard, prism, gravity, reveal, line, expand, focus, pop-up, drafting, sideways draft, vertical draft, horizontal blur, soft edge, lens flare, pull force, boogie lights, pixie dust, organic main organic lower, ticker, data/time, clouds, far far away, gradient white, soft blur white paper, formal, gradient black, soft blur black, torn edge black, torn edge tan, etc. In addition, some embodiments provide at least one title for each theme. For instance, if the available themes are modern, bright, playful, neon, travel, simple, news, and CNN iReport, then at least one title per theme is provided in the list of available titles. In the example of
Examples of events (or sub elements) of a graphical element include any animations such as an image in a title or transition zooming in or zooming out; a scene fading in or fading out; a message popping out; text added to the screen as if being typed; any animations such as swap, cut, page curl, fade, spin out, mosaic, ripple, blur, dissolve, wipe, credits start scrolling, etc.
In the example of
Referring back to
Process 800 then finds (at 815) the starting time and the duration of each event in the graphical element. The process then sets (at 820) the current event to the first event of the graphical element. The process then determines the starting time of the sound effects clip for the current event based on the starting time of the event. The process in some embodiments also considers other properties of the graphical element such as the duration of the event in order to determine the starting time of the sound effects clip. For instance, if the duration of an event is too short, some embodiments do not add the sound effects clip for the event.
The process also retimes and/or trims (at 830) the sound effects clip for the current event of the graphical element based on the starting time and duration of the event, starting time of the next event, starting time and the duration (or ending time) of the graphical element, starting time of the next clip, etc. The process then determines (at 835) whether all events in the graphical element are examined. If yes, the process proceeds to 845, which is described below. Otherwise, the process sets (at 840) the current event to the next event in the graphical element. The process then proceeds to 825, which was described above.
Process 800 determines (at 815 described above) the starting time and duration of each event for a graphical element. For instance, process 800 determines the starting time of the first event (starting at dashed line 1205), the duration of the first event (between dashed lines 1205 and 1210), the starting time of the second event (starting at dashed line 1215), the duration of the second event (between dashed lines 1215 and 1220), starting time of the next clip (not shown). In some embodiments, process 800 finds the time and duration of the events by analyzing the properties of the graphical element 930. For instance, the process analyzes the metadata associated with the graphical element or performs a table look up to find the properties of each graphical element (e.g., based on the type and style of the graphical element, the theme of the video sequence, etc.).
The process then spots the audio clips (at 825 described above) to the proper location of the video sequence (i.e., determines where each audio clip has to start). The process optionally trims or fades (at 830 described above) each audio clip based on the duration of the event, starting time of the next event, the starting time of the next clip, etc. For instance, a portion of a sound clip for an event may continue after the end of the event. In the example of
Referring back to
Some embodiments allow several graphical elements to overlap each other or added in vicinity of each other. Different embodiments provide sound effects for these overlapping or nearby graphical elements differently. For instance, some embodiments overlap the corresponding sound effects clips after retiming, trimming, and/or fading one or more of the sound effects clips. Yet other embodiments favor one sound effects clip over the others.
In
As shown in
The title has two events starting at dashed lines 1715 and 1725. As shown, two sound effects audio clips 1730 and 1735 corresponding to the two events are added to the movie. In this example, the two audio clips are trimmed, retimed, and/or faded to fit the movie (as described above by reference to
The sound effects module 1805 communicates with titling module 1810, transition module 1815, and visual effect module 1820 to get information about the added graphical elements such as titles, transitions, visual filters, etc.
The sound effects module 1805 performs lookups into sound effects lookup table 1825 to find a set of sound effects clips for each graphical element. The sound effects module 1805 analyzes properties of each graphical element. Based on the properties of each graphical element, sound effects module 1805 selects a set of sound effects clips from sound effects files database 1830.
The sound effects module 1805 utilizes fading computation module 1835, trimming computation module 1840, and/or sound spotting module 1845 to perform fading, trimming, and spotting operations for the sound clip. The sound effects module 1805 stores the resulting sound effects and informs the video and sound scheduler module 1850 to schedule the added sound effects clips for playback.
The video and sound scheduler module 1850 schedules the video and audio clips including the sound effects clips during playback and monitoring and sends the clips to player 1860 to display on a display screen 1855. The player optionally sends the video and audio clips to renderer module 1865. The renderer module generates a sequence of video and audio clips and saves the sequence in rendered movies database 1870, for example to burn into storage media such as DVDs, Blu-Ray® discs, etc.
In some embodiments, titling module 1810, transition module 1815, and visual effect module 1820 are part of a media editing applications. In other embodiments, these modules are part of an animation engine that animates still images or video images and requires sound effects to be applied to visual elements.
As shown, the automatic sound effects creation system 1800 is similar to the automatic sound effects creation system described by reference to
The animation engine 1905 is any application that creates still image and/or video image animation and requires sound effects for the animations. Examples of animations include but is not limited to zoom in, zoom out, fade in, fades out, messages popping out, text added to an image, swap, cut, page curl, spin out, mosaic, ripple, blur, dissolve, wipe, text start or stop scrolling, etc.
The animation engine 1905 provides the properties of the animation (e.g., type, style, duration, number of events, etc.) of visual elements to the sound effects module 1805. The sound effects module 1805 performs lookups into sound effects lookup table 1825 to find a set of sound effects clips for each visual element. The sound effects module 1805 analyzes properties of each visual element. Based on the properties of each visual element, sound effects module 1805 selects a set of sound effects clips from sound effects files database 1830.
The sound effects module 1805 utilizes fading computation module 1835, trimming computation module 1840, and/or sound spotting module 1845 to perform fading, trimming, and spotting operations for the sound clip. The sound effects module 1805 sends the sound effects clips to the animation engine 1905 and optionally stores the clips in the sound effects files database 1830.
Many of the above-described features and applications are implemented as software processes that are specified as a set of instructions recorded on a computer readable storage medium (also referred to as computer readable medium, machine readable medium, machine readable storage). When these instructions are executed by one or more computational or processing unit(s) (e.g., one or more processors, cores of processors, or other processing units), they cause the processing unit(s) to perform the actions indicated in the instructions. Examples of computer readable media include, but are not limited to, CD-ROMs, flash drives, random access memory (RAM) chips, hard drives, erasable programmable read only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), etc. The computer readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.
In this specification, the term “software” is meant to include firmware residing in read-only memory or applications stored in magnetic storage, which can be read into memory for processing by a processor. Also, in some embodiments, multiple software inventions can be implemented as sub-parts of a larger program while remaining distinct software inventions. In some embodiments, multiple software inventions can also be implemented as separate programs. Finally, any combination of separate programs that together implement a software invention described here is within the scope of the invention. In some embodiments, the software programs, when installed to operate on one or more electronic systems, define one or more specific machine implementations that execute and perform the operations of the software programs.
A. Mobile Device
The automatic custom sound effects adding in some embodiments operates on mobile devices, such as smart phones (e.g., iPhones®) and tablets (e.g., iPads®).
The peripherals interface 2015 is coupled to various sensors and subsystems, including a camera subsystem 2020, a wireless communication subsystem(s) 2025, an audio subsystem 2030, an I/O subsystem 2035, etc. The peripherals interface 2015 enables communication between the processing units 2005 and various peripherals. For example, an orientation sensor 2045 (e.g., a gyroscope) and an acceleration sensor 2050 (e.g., an accelerometer) is coupled to the peripherals interface 2015 to facilitate orientation and acceleration functions.
The camera subsystem 2020 is coupled to one or more optical sensors 2040 (e.g., a charged coupled device (CCD) optical sensor, a complementary metal-oxide-semiconductor (CMOS) optical sensor, etc.). The camera subsystem 2020 coupled with the optical sensors 2040 facilitates camera functions, such as image and/or video data capturing. The wireless communication subsystem 2025 serves to facilitate communication functions. In some embodiments, the wireless communication subsystem 2025 includes radio frequency receivers and transmitters, and optical receivers and transmitters (not shown in
The I/O subsystem 2035 involves the transfer between input/output peripheral devices, such as a display, a touch screen, etc., and the data bus of the processing units 2005 through the peripherals interface 2015. The I/O subsystem 2035 includes a touch-screen controller 2055 and other input controllers 2060 to facilitate the transfer between input/output peripheral devices and the data bus of the processing units 2005. As shown, the touch-screen controller 2055 is coupled to a touch screen 2065. The touch-screen controller 2055 detects contact and movement on the touch screen 2065 using any of multiple touch sensitivity technologies. The other input controllers 2060 are coupled to other input/control devices, such as one or more buttons. Some embodiments include a near-touch sensitive screen and a corresponding controller that can detect near-touch interactions instead of or in addition to touch interactions.
The memory interface 2010 is coupled to memory 2070. In some embodiments, the memory 2070 includes volatile memory (e.g., high-speed random access memory), non-volatile memory (e.g., flash memory), a combination of volatile and non-volatile memory, and/or any other type of memory. As illustrated in
The memory 2070 also includes communication instructions 2074 to facilitate communicating with one or more additional devices; graphical user interface instructions 2076 to facilitate graphic user interface processing; image processing instructions 2078 to facilitate image-related processing and functions; input processing instructions 2080 to facilitate input-related (e.g., touch input) processes and functions; audio processing instructions 2082 to facilitate audio-related processes and functions; and camera instructions 2084 to facilitate camera-related processes and functions. The instructions described above are merely exemplary and the memory 2070 includes additional and/or other instructions in some embodiments. For instance, the memory for a smartphone may include phone instructions to facilitate phone-related processes and functions. Additionally, the memory may include instructions for automatic custom sound effects adding to graphical elements as well as instructions for other applications. The above-identified instructions need not be implemented as separate software programs or modules. Various functions of the mobile computing device can be implemented in hardware and/or in software, including in one or more signal processing and/or application specific integrated circuits.
While the components illustrated in
B. Computer System
The bus 2105 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 2100. For instance, the bus 2105 communicatively connects the processing unit(s) 2110 with the read-only memory 2130, the GPU 2115, the system memory 2120, and the permanent storage device 2135.
From these various memory units, the processing unit(s) 2110 retrieves instructions to execute and data to process in order to execute the processes of the invention. The processing unit(s) may be a single processor or a multi-core processor in different embodiments. Some instructions are passed to and executed by the GPU 2115. The GPU 2115 can offload various computations or complement the image processing provided by the processing unit(s) 2110.
The read-only-memory (ROM) 2130 stores static data and instructions that are needed by the processing unit(s) 2110 and other modules of the electronic system. The permanent storage device 2135, on the other hand, is a read-and-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 2100 is off. Some embodiments of the invention use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive, integrated flash memory) as the permanent storage device 2135.
Other embodiments use a removable storage device (such as a floppy disk, flash memory device, etc., and its corresponding drive) as the permanent storage device. Like the permanent storage device 2135, the system memory 2120 is a read-and-write memory device. However, unlike storage device 2135, the system memory 2120 is a volatile read-and-write memory, such a random access memory. The system memory 2120 stores some of the instructions and data that the processor needs at runtime. In some embodiments, the invention's processes are stored in the system memory 2120, the permanent storage device 2135, and/or the read-only memory 2130. For example, the various memory units include instructions for processing multimedia clips in accordance with some embodiments. From these various memory units, the processing unit(s) 2110 retrieves instructions to execute and data to process in order to execute the processes of some embodiments.
The bus 2105 also connects to the input and output devices 2140 and 2145. The input devices 2140 enable the user to communicate information and select commands to the electronic system. The input devices 2140 include alphanumeric keyboards and pointing devices (also called “cursor control devices”), cameras (e.g., webcams), microphones or similar devices for receiving voice commands, etc. The output devices 2145 display images generated by the electronic system or otherwise output data. The output devices 2145 include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD), as well as speakers or similar audio output devices. Some embodiments include devices such as a touchscreen that function as both input and output devices.
Finally, as shown in
Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media). Some examples of such computer-readable media include RAM, ROM, read-only compact discs (CD-ROM), recordable compact discs (CD-R), rewritable compact discs (CD-RW), read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM), a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.), flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc.), magnetic and/or solid state hard drives, read-only and recordable Blu-Ray® discs, ultra density optical discs, any other optical or magnetic media, and floppy disks. The computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.
While the above discussion primarily refers to microprocessor or multi-core processors that execute software, some embodiments are performed by one or more integrated circuits, such as application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs). In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself. In addition, some embodiments execute software stored in programmable logic devices (PLDs), ROM, or RAM devices.
As used in this specification and any claims of this application, the terms “computer”, “server”, “processor”, and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people. For the purposes of the specification, the terms display or displaying means displaying on an electronic device. As used in this specification and any claims of this application, the terms “computer readable medium,” “computer readable media,” and “machine readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.
While the invention has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the invention can be embodied in other specific forms without departing from the spirit of the invention. For instance, many of the figures illustrate various touch gestures (e.g., taps). However, many of the illustrated operations could be performed via different touch gestures (e.g., double tap gesture, press and hold gesture, swipe instead of tap, etc.) or by non-touch input (e.g., using a cursor controller, a keyboard, a touchpad/trackpad, a near-touch sensitive screen, etc.). In addition, a number of the figures (including