This disclosure relates generally to editing digital multimedia content.
Digital multimedia content, for example, audio, video, images, and the like, can be captured using media capturing devices, such as, microphones, video cameras, and the like. The content can be transferred from the capturing devices to computer systems and viewed or edited (or both) using one or more computer software applications. Digital multimedia content can include both audio content and video content. For example, a video camera can capture video and audio of two persons having a conversation. The audio and video can be edited with a digital multimedia editing application. Some editing applications provide a user interface for displaying video and audio objects representing video and audio content. When audio objects are edited by the user, the audio objects may get out of sync with the video objects making the editing process difficult.
This disclosure describes technologies relating to editing audio in digital multimedia using user interfaces. In some implementations, a digital multimedia editing application provides lanes for displaying video and audio objects (e.g., a conversation between actors) as well as effects (FX) objects and a music sound track. Each object in a lane corresponds to a video, audio or effect file stored in the computer system. An audio object may represent a multichannel audio signal (e.g., a stereo or surround mix). One or more user interfaces provided by the editing application allows a user to visually separate the audio components from the multichannel audio signal and to edit those components independently by, for example, adjusting volume, equalizing, panning or applying effects. These editing operations are performed on audio objects while maintaining synchronization with corresponding video objects in a video lane.
One innovative aspect of the subject matter described here can be implemented as a computer-implemented method. In a first portion of a user interface, an item of digital multimedia content that includes video content and audio content that is synchronized with the video content is displayed. The audio content includes audio from multiple audio channels. In a second portion of the user interface, multiple audio objects, each representing an audio component of the multiple audio components, are displayed. An input to an audio object of the multiple audio objects is detected. In response to detecting the input of the audio object, at least one feature of an audio component that the audio object represents is modified while maintaining a synchronization of the video content and the audio content.
This, and other aspects, can include one or more of the following features. The item of digital multimedia content can be displayed in the second portion of the user interface and adjacent to the multiple audio objects. The item of digital multimedia content can span a duration. The item of digital multimedia content can be displayed as a video object of a dimension that corresponds to the duration of the item of digital multimedia content. An object of the multiple audio objects can be displayed with the dimension that corresponds to the duration of the item of digital multimedia content. Input to extend the dimension of the audio object of the multiple audio objects beyond the dimensions can be detected. In response to detecting the input, an audio component that the audio object represents can be extended beyond the duration of the item of digital multimedia content. Each audio component can be a monophonic audio channel. The multiple monophonic audio channels can be organized into one or more stereophonic audio components in response to input. Each audio component of stereophonic sound can include two monophonic audio channels. A feature of a stereophonic audio component can be modified in response to input. The audio content can be modified according to the modified feature of the stereophonic audio component. In a third portion of the user interface, another multiple audio objects can be displayed in response to input. Each of the other multiple audio objects can represent the audio component of the multiple audio components. The multiple audio objects can be organized into a single object representing the audio content in response to input. The single audio object can be displayed in the second portion of the user interface instead of the multiple audio objects. To modify at least one feature of an audio component that the audio object represents in response to detecting the input of the audio object, a selection of a portion of the audio object can be detected. The portion can span a duration of time. In response to detecting the selection of the portion, at least one feature of the audio component that the portion of the object represents can be modified. The input can include input to silence audio in the selected portion.
Another innovative aspect of the subject matter described here can be implemented as a computer-readable medium storing instructions executable by data processing apparatus to perform operations. The operations include displaying an item of digital multimedia content that includes synchronized video content and audio content in a user interface. The video content includes multiple frames and the audio content includes audio from multiple audio channels. The operations include displaying, in the user interface, a subset of the multiple frames included in the video content. The operations include displaying, in the user interface, multiple audio objects that correspond to the multiple audio components. The multiple audio components represent a portion of audio content included in the multiple audio channels and synchronized with the subset of the multiple frames. The operations include detecting a selection of an audio object of the multiple audio objects, and, in response, modifying a feature of an audio component that the audio object represents.
This, and other aspects, can include one or more of the following features. The feature can include a decibel level of the audio component. Modifying the feature of the audio component can include decreasing the decibel level of the audio component. The operations can include displaying a name of the audio component that the audio object represents in the second portion of the user interface, and displaying a modified name of the audio component instead of the name in response to input to modify the name of the audio component. Detecting the selection of the audio object of the multiple audio objects can include detecting a selection of a portion of the audio object of the multiple audio objects. Modifying the feature of the audio component in the portion of the audio object can include disabling all features of a portion of the audio component represented by the portion of the audio object. The operations can include detecting a selection of a portion of the audio object of the multiple audio objects, displaying a border around the portion of the audio object, displaying a horizontal line within the portion at a position that represents a level of the feature, and modifying the feature of the audio component in the portion in response to and according to a modification of the position of the horizontal line. Each audio component can be a monophonic audio channel. The operations can include displaying a first option to organize the multiple monophonic audio channels into one or more stereophonic audio components and a second option to organize the multiple monophonic audio channels into a single component, detecting a selection of either the first option or the second option, and organizing the multiple monophonic audio channels into either one or more stereophonic audio component or the single component based on the selection. Displaying the multiple audio objects in the user interface can include displaying the multiple audio objects below the subset of the multiple frames. A horizontal dimension of each audio object of the multiple audio objects can be substantially equal to a horizontal dimension of a video object in which the subset of the multiple frames is displayed. The operations can include displaying multiple effects objects in the user interface. Each effects object can represent a predefined modification that is applicable to one or more effects in an audio component. The operations can include detecting a selection of a particular effects object that represents a particular predefined modification and a particular audio object that represents a particular audio component. The operations can include modifying one or more features in the particular audio component according to the predefined modification. Modifying a feature of an audio component that the audio object represents can include displaying a modification to the feature as an animation within the audio object. The operations can include receiving input to assign an audio type to an audio component, and assigning the audio type to the audio component in response to the input. The audio type can include at least one of a dialogue, music, or an effect.
A further innovative aspect of the subject matter described here can be implemented as a system that includes one or more data processing apparatus and a computer-readable medium storing instructions executable by the one or more data processing apparatus to perform operations. The operations include displaying, in a user interface, a thumbnail video object that represents a video portion of an item of digital multimedia content. The operations include displaying, in the user interface, multiple audio objects representing multiple audio components included in an audio portion of the item of digital multimedia content. The operations include detecting, in the user interface, a selection of an audio object of the multiple audio objects. The operations include, in response to detecting the selection, modifying a feature of an audio component that the audio object represents, and modifying the audio portion of the item of digital multimedia content according to the modified feature of the audio component.
This, and other aspects, can include one or more of the following features. The operations can include assigning an audio type to each audio component in response to receiving input. Modifying the feature of the audio component that the object represents can include displaying multiple audio types in the user interface, and displaying multiple selectable controls in the user interface. Each selectable control can be displayed adjacent a respective audio type. Modifying the feature can include detecting a selection of a particular selectable control displayed adjacent a particular audio type, and disabling a feature associated with the particular audio type in response to detecting the selection.
An additional innovative aspect of the subject matter described here can be implemented as a computer-implemented method. In a user interface, a first item of digital multimedia content that includes video content received from a first viewing position and audio content received from multiple audio components is displayed. The audio content is synchronized with the video content. In the user interface, a second item of digital multimedia content that includes the video content received from a second viewing position and the audio content received from multiple second audio components is displayed. In response to detecting input to modify a feature of either a first audio component or a second audio component, the audio content received from the multiple first audio components or from the multiple second audio components is modified.
This, and other aspects, can include one or more of the following features. A selection of the first item of digital multimedia content can be detected. In response to detecting the first item of digital multimedia content, the multiple first audio components can be displayed, and the multiple second audio components can be hidden from display. The video content can include multiple frames. A selection of a portion of the first item of digital multimedia content that includes video content received from the first viewing position can be detected. A subset of the multiple frames can be displayed. The subset can correspond to the portion of the first item of digital multimedia content that includes video content received from the first viewing position. Multiple audio objects, each of which represents a portion of a first audio component that is synchronized with the portion of the first item of digital multimedia content that includes video content received from the first viewing position can be displayed.
The details of one or more implementations of a user interface for audio editing are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of editing the audio will become apparent from the description, the drawings, and the claims.
Like reference symbols in the various drawings indicate like elements.
This disclosure generally describes computer-implemented methods, computer software, and computer systems for editing items of digital multimedia content using user interfaces. In general, an item of digital multimedia content can include at least two different types of digital multimedia content synchronized with each other. The types of digital multimedia content can include video content, audio content, images, text, and the like. For example, an item of digital multimedia content can include frames of video content that visually represent two persons having a conversation and corresponding audio content that can include each person's voice and any ambient noises. The video content and the audio content are synchronized. For example, the audio content that includes a person's voice corresponds to the person's lip movements in the video content. Each of the video content and the audio content in the item can be edited. For example, a brightness or contrast of the video content can be modified and background music can be added to the audio content. Digital multimedia content added by editing can be synchronized with digital multimedia content already included in the item.
In some implementations, an item of digital multimedia content can be presented in a user interface as one or more objects. For example, video content and audio content can be displayed in the user interface as respective video and audio objects. The audio content can include multiple components of audio. With reference to the example item of digital multimedia content described above, the video of the two persons having the conversation can be represented as a video object, for example, as one or more thumbnails. The voice of each of the two person's having the conversation can be an audio component which can also be displayed in the user interface as a respective audio object. Editing operations can be performed by providing inputs to the user interface which can include, for example, selecting, re-sizing, re-positioning the video objects or the audio objects (or both). As described below, an audio object represents an audio component. An audio component can include one or more audio channels. For example, an audio component can consist of a monophonic audio channel or stereophonic audio channels or a surround mix. Thus, an audio component can consist of two monophonic audio channels that collectively make up a stereophonic audio component.
This disclosure describes computer systems that present user interfaces which can enable authoring items of digital multimedia content by a user, and more particularly audio content included in the items. As described below, the computer systems can configure the user interfaces to provide a consolidated video/audio view that can allow an overall view of the video content and audio content included in the item of digital multimedia content. In addition, the computer systems can enable the user to individually edit either the video content or the audio content (or both) in the same consolidated video/audio view instead of in separate views. In the user interfaces, the computer systems can present the video content and multiple components of audio as respective selectable video and audio objects in a timeline that represents a duration of the item of digital multimedia content. Each object can be manipulated in context to the timeline without leaving the consolidated video/audio view. In this manner, the computer systems can present a consolidated video/audio view as video and audio objects that include both the video and audio contents, and enable editing of each object individually while maintaining synchronization between the video and audio content.
Examples of editing operations that a user can perform on the audio content, particularly, on each audio component, using the user interfaces include trimming start and end points of audio components, disabling or removing ranges within the audio components, adjusting volume or pan on individual audio components, adding and manipulating effects on individual audio components or all of the audio content (or both), understanding audio included in a component, enabling or disabling certain features for ranges of or all of audio components, and the like. As described below, each editing operation can be performed by selecting all or portions of an object that represents an audio component or the audio content.
The computer system 102 can display an item of digital multimedia content 206 in a first portion 204 of the user interface 200a, for example, a portion in which the item 206 can be played back. The item 206 can include video content, and audio content that is both synchronized with the video content and that includes audio from multiple audio components. In a second portion 210 of the user interface 200a, the computer system 102 can display multiple audio objects (for example, audio object 208a, audio object 208b, audio object 208c, audio object 208d), each representing an audio component of the multiple audio components. The second portion 210 can be a timeline portion that displays either the video content or the audio content (or both) chronologically. The computer system 102 can detect an input to an audio object (for example, audio object 208a) of the multiple audio objects. The input can be a selection of the audio object, for example, of a point in the audio object, a portion of the audio object, or the audio object in its entirety. In response to detecting the input to audio object 208a, the computer system 102 can modify at least one feature of an audio component that audio object 208a represents while maintaining a synchronization of the video content and the audio content.
For example, the item of digital multimedia content 206 can include video content that shows two persons having a conversation. The audio content included in the item 206 can include audio in multiple audio components, which include each person's voice in a respective audio component and an additional audio component (for example, background score, ambient noises, voice-overs, voices of persons off-camera, or the like). As
Either in default implementations or in response to input, the computer system 102 can display the item of digital multimedia content in the second portion 210 of the user interface 200a, for example, adjacent to the multiple audio objects 208a-208d. The input can include, for example, a drag-and-drop of the video object representing item 206 in the user interface 200a, a selection of a key on the keyboard, voice input, or the like. In some implementations, the computer system 102 can display the item of digital multimedia content as a rectangular video object having a horizontal dimension that corresponds to a duration of playback of the item. The computer system 102 can display each audio component, which spans a duration equal to the duration of playback, as a respective rectangular audio object having the same horizontal dimension as the item.
In some implementations, the computer system 102 can organize (for example, reconfigure) the multiple audio objects representing the multiple audio channels into fewer audio objects. With reference to the example described above, the computer system 102 can receive input to organize (for example, reconfigure) the six monophonic audio channels into a single audio component that represents the audio content of the item of digital multimedia content. As shown in user interface 200b (
As shown in user interface 200b (
In some implementations, the computer system 102 can extend an audio component beyond a duration of the item of digital video content. For example, the computer system 102 can detect a selection of an edge (such as, the right edge) of an audio object that represents an audio component, and can further detect a dragging of the selected edge away from the audio object (i.e., toward the right). As shown in user interface 200d (
As described above, the computer system 102 can modify at least one feature of an audio component that an audio object (for example, the audio object 232b) represents while maintaining a synchronization of the video content and the audio content. To do so, for example, the computer system 102 can detect a selection of the audio object 232b in the user interface 200e. In response to detecting the selection, the computer system 102 can cause a panel 238 to be displayed in the portion 234 of the user interface 200f (
In some implementations, the computer system 102 can enable a user to modify features of an audio component by providing input to an audio object that represents the audio component. For example, within the audio object 232b, the computer system 102 can display a horizontal line from the left edge to the right edge of the object 232b. The position of the line within the audio object 232b can represent a level of a feature of the audio component. If the feature is decibel level, for example, then the level can be a minimum decibel level if the horizontal line is positioned near the bottom edge and a maximum decibel level if positioned near the top edge. The computer system 102 can enable a user to adjust the position of the horizontal line using the input devices, and thereby modify the feature of the audio component. To modify the feature of the entire audio component, the user can select the audio object 232b, and then select and move the horizontal line to adjust the feature of the audio component. For example, to decrease the decibel level of the entire audio component, the user can lower the position of the horizontal line displayed within the object audio 232b.
Alternatively, or in addition, the computer system 102 can modify the feature of a segment of the audio component. To do so, the computer system 102 can detect a selection of a portion of the audio object that spans a duration of time. In response to detecting the portion, the computer system 102 can modify at least one feature of the audio component that the portion of the object represents. For example, the user can select a portion 236 (
In some implementations, the computer system 102 can receive input to playback the modified audio component or the audio component modified to reflect the modified audio component. The computer system 102 can playback the audio in only the audio component or the entire audio content that includes all the audio components. In addition, the computer system 102 can display a vertical line 242 that runs across the video object that represents the item of digital multimedia content and the multiple audio objects. The position of the vertical line on the audio object can correspond to a beginning of a portion of the audio component that has been modified. As the modified audio component plays back, the computer system 102 can cause the vertical line to move horizontally across the audio object until the end of the playback. When the playback ends, the computer system 102 can display the vertical line at a position on the audio object that corresponds to the end of the portion of the modified audio component. As shown in user interface 200g (
Returning to
In some implementations, the computer system 102 can assign names to each audio component of the multiple audio components included in the audio content of the item of digital multimedia content. The computer system 102 can display a name of each audio component in the user interface 300b (
As described above with reference to
In some implementations, the computer system 102 can create “mute regions” (also known as “knocked out regions”) in response to user input and disable all features of the audio component within the knocked out regions. To create a knocked out region, the user can select a portion of an audio object as described above with reference to
The user interface 300e (
The computer system 102 can detect a selection of either the first option or the second option. For example, the computer system 102 can detect that the user has selected the option “3 Stereo” that represents input to display the six monophonic audio channels as three stereophonic audio components. In response, the computer system 102 can organize and display the multiple audio channels into multiple stereophonic audio components. As shown in user interface 400b (
The computer system 102 can detect a selection of a particular effects object 514 that represents a particular predefined modification and a particular audio object 516 displayed in the user interface 500a that represents a particular audio component. For example, the user can perform a drag-and-drop operation by selecting the effects object 514, dragging the effects object 514 across the user interface 500a and dropping the effects object 514 over the audio object 516. In response to this input, the computer system 102 can modify one or more effects in the particular audio component according to the predefined modification. For example, to visually communicate the modification of the audio component represented by the audio object 516 according to the effect represented by the effects object 514, the computer system 102 can display the audio object 516 to be visually discerned from other audio objects representing other audio components. For example, the computer system 102 can display a border around the audio object 516 or display the audio object 516 in a lighter color than other audio objects.
In some implementations, the computer system 102 can enable a user to edit and key frame an effect applied to the audio component. As shown in the user interface 500b (
The computer system 102 can additionally enable a user to apply multiple effects to the audio content. More specifically, the computer system 102 can receive a first input to apply a first effect (“Less Bass”) to only an audio component (“mono 3”) included in the audio content and a second input to apply a second effect (“Less Treble”) to the entire audio content collectively represented by all the audio components. In response to receiving the first input to apply the first effect to only the audio component, the computer system 102 can modify features of the audio component according to the first effect. In response to receiving the second input to apply the second effect to the entire audio content, the computer system 102 can modify features of the audio content according to the second effect alone. As shown in the user interface 500c (
As shown in user interface 600a (
Using controls in the panel 604, audio components that are assigned a certain role (or roles) can be controlled, for example, turned off. For example, in the panel 604, the control “Music” has been disabled (i.e., de-selected) whereas the controls “Video,” “Dialogue,” and “Effects,” are enabled (i.e., selected). The computer system 102 can turn off the audio component or components that have been assigned “Music” as the role while enabling remaining audio component or components.
For example, the computer system 102 can detect a selection of first item 1202. The selection represents input to edit audio content included in the first item 1202, i.e., content captured from the first angle. The computer system 102 can additionally display a first audio object 1210 that represents the audio content received from the first viewing position and a second audio object 1212 that represents the audio content received from the second viewing position in a portion 1208 of the user interface. The computer system 102 can additionally display the video content received from the selected viewing position and the audio content received from the selected viewing position in respective video objects and audio objects in the portion 1214 of the user interface 1200a. Thus, the computer system 102 can enable a user to modify audio content from any viewing position (for example, the first viewing position) while viewing video content from the same viewing position (i.e., the first viewing position) or from the other viewing position (i.e., the third viewing position). The computer system 102 can similarly enable the user to modify features of audio components captured from more than two viewing positions, i.e., more than two angles.
As described above, the computer system 102 can display the entire audio content received from the first viewing position and the second position as a single audio object. The computer system 102 can enable a user to modify features of the audio content by providing input to the single audio object. In some implementations, the computer system 102 can detect a selection of the first object 1210 or the second object 1212. In response, the computer system 102 can display audio objects that represent the first audio components or audio objects that represent the second audio components, respectively. The computer system 102 can enable the user to modify features of each audio component by providing input to a respective audio object that represents the audio component. The computer system 102 can display only one set of audio components at a time resulting in the first audio components being hidden from display when the second audio components are selected for display. Alternatively, the computer system 102 can display both audio components simultaneously in the user interface.
As shown in
In some implementations, the video content can include multiple frames. The computer system 102 can detect a selection of a portion of the first item of digital multimedia content 1202. In response, the computer system 102 can display a subset of the multiple frames that corresponds to the portion of the first item 1202. The computer system 102 can additionally display multiple audio objects, each of which represents a portion of a first audio component that is synchronized with the portion of the first item 1202.
The term “computer-readable medium” refers to a medium that participates in providing instructions to processor 1402 for execution, including without limitation, non-volatile media (e.g., optical or magnetic disks), volatile media (e.g., memory) and transmission media. Transmission media includes, without limitation, coaxial cables, copper wire and fiber optics.
Computer-readable medium 1412 can further include operating system 1414 (e.g., a Linux® operating system) and network communication module 1416. Operating system 1414 can be multi-user, multiprocessing, multitasking, multithreading, real time, etc. Operating system 1414 performs basic tasks, including but not limited to: recognizing input from and providing output to devices 1406, 1408; keeping channel and managing files and directories on computer-readable mediums 1412 (e.g., memory or a storage device); controlling peripheral devices; and managing traffic on the one or more communication channels 1410. Network communications module 1416 includes various components for establishing and maintaining network connections (e.g., software for implementing communication protocols, such as TCP/IP, HTTP, etc.).
Architecture 1400 can be implemented in a parallel processing or peer-to-peer infrastructure or on a single device with one or more processors. Software can include multiple software components or can be a single body of code.
The described features can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. A computer program is a set of instructions that can be used, directly or indirectly, in a computer to perform a certain activity or bring about a certain result. A computer program can be written in any form of programming language (e.g., Objective-C, Java), including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, a browser-based web application, or other unit suitable for use in a computing environment.
Suitable processors for the execution of a program of instructions include, by way of example, both general and special purpose microprocessors, and the sole processor or one of multiple processors or cores, of any kind of computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memories for storing instructions and data. Generally, a computer will also include, or be operatively coupled to communicate with, one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).
The features can be implemented in a computer system that includes a back-end component, such as a data server, or that includes a middleware component, such as an application server or an Internet server, or that includes a front-end component, such as a client computer having a graphical user interface or an Internet browser, or any combination of them. The components of the system can be connected by any form or medium of digital data communication such as a communication network. Examples of communication networks include, e.g., a LAN, a WAN, and the computers and networks forming the Internet.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some embodiments, a server transmits data (e.g., an HTML page) to a client device (e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device). Data generated at the client device (e.g., a result of the user interaction) can be received from the client device at the server.
A system of one or more computers can be configured to perform particular actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions. One or more computer programs can be configured to perform particular actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.
A number of implementations of the invention have been described. Nevertheless, it will be understood that various modifications can be made without departing from the spirit and scope of the invention.