Social media has helped to popularize sharing events and occurrence in the lives of individuals. Shared events may include travel, celebrations, parties, and the like, while the items shared may include pictures, messages, short clips comprising audio and video, movies, and the like. For many individuals, music and performing are important parts of life. Facilitating the sharing of music and performances, and of related events, would resonate with many persons.
For example, many persons would enjoy sharing their own musical contributions. Singing along to a song, with musical background, is the basis for many popular television shows, and many persons would like to duplicate a similar effort on their part, on a much less grand scale. Such sharing activities would be especially welcome if they could be integrated with personal mobile devices, such as smart phones and the like, in what might be characterized as a “musical selfie”. Sharing performances such as renditions of popular songs, unaccompanied singing (a capella), and acting, and the like, may also be desired. Greater convenience in such sharing, with opportunities to produce musical works that show the performer in a good light, would likely be well-received and would likely prove to be popular.
Disclosed are techniques for adding user-generated audio and/or video content to a multi-track clip simultaneously with the user listening to and viewing a playback of the clip. The clip may comprise, for example, a recorded, commercially-available, professional music performance. The clip may comprise a previously recorded performance, or even no performance at all (i.e., a clip comprising a blank track). The user's contributed performance may comprise singing along with the professional performance. Convenience is served by providing an application that can be executed on a local system comprising a user's mobile device, such as a smart telephone or tablet device. The application supports storing the combined user performance and clip at a local device as a composite performance. The stored, combined user performance and clip can then be used as a new multi-track composite clip to which, in turn, a user can add new audio and/or video content simultaneously with the user listening to and viewing a playback of the stored composite clip. The resulting combination of the new multi-track composite clip and new user contribution of audio and/or video content can similarly comprise the basis for yet another clip, to which a user can add new audio and/or video content, and so forth, repeatedly, if desired. In this way, multiple user performances can be combined with pre-recorded composite clips to produce a new composite clip.
A prior composite clip may comprise, for example, a recorded, commercially-available, professional music performance. A prior composite clip may comprise, for example, a user performance, such as a non-singing user performance, such that the new composite clip may appear as though the user is “lip-synching” to the preceding audio/video performance. In another example, multiple user performances may be cumulatively added to a composite clip either in parallel or serially. In this way, multiple user performances may be combined to produce combined performances that demonstrate harmony, or a capella renditions. Effects processing may also be applied to the user contribution. The effects processing may comprise audio effects, or video effects, or a combination of both audio and video effects. The clip may include separate tracks for an instrumental portion of the clip and a lead vocals portion of the clip. As noted above, the clip may include separate tracks for multiple user contributions. Such multi-track input facilitates a user listening to a recognizable professional performance, for example through earphones or headphones at a mobile device, while recording the user's performance, to replace the lead vocals portion of the professional performance with the recorded user performance. The effects processing can be used to improve the user's performance, increasing user satisfaction. The composite performance may be previewed and can be sent to a computer device over a computer network for sharing with other users.
The clips may be selected from a library of available clips. For example, the clip library may include music clips, movie clips, spoken word clips, video clips, and so forth. The effects processing may be selected from a library of available effects, to be applied to the user performance. The effects processing may provide adjustments such as reverberation, tone adjustment, pitch adjustment, and other audio and video effects, as described further below. The selection of clips and of effects processing by users can be tallied, and statistics relating to the selections and their popularity may be used to improve the relevance of available clips and effects processing. The recorded clips may include previously submitted composite performances, to layer additional user performances on top of other performances (vertical layering) or alongside other performances, before or after someone else's performance in time (horizontal layering). Viewing the previously submitted performances and applying effects processing, and previewing the results, can be performed remotely, at the user's device, so that no downloads of performances are necessary.
Other features and advantages of the present invention should be apparent from the following description of preferred embodiments that illustrate, by way of example, the principles of the invention.
The techniques disclosed herein enable a user to record a user performance with a computer device, such as a smart phone or portable music device or other mobile device. The user performance may broadly encompass user generated content such as singing along to background vocals and/or instrumental playing, dramatic acting, spoken word, lip synching, a capella renditions, physical activity, competition, and so forth. Combining such user performances with previously stored composite performances may be implemented through an application installed at the user's mobile device. With the installed application, the user can listen to a pre-recorded multi-track clip and can add user-generated audio and video performances to the clip, while listening and viewing playback of the clip. The clip may comprise, for example, a recorded, commercially-available, professional music performance. The user's added performance may comprise, for example, singing along with the professional performance. The installed application permits convenient adding and editing of the user's performance to the original clip, replacing a lead vocal or similar portion of the original clip with the user's performance and producing a composite clip.
The composite clip can be uploaded to a social media sharing site, for greater distribution of the user's composite clip. The clip typically will correspond to a recorded, commercially-available, professional artist performance, and the artist performance on which the clip is based may comprise, for example, a song or other complete artist performance that is commercially available, or may comprise a portion of the artist performance, such as a chorus or “hook” from a song. The clip for use with the application disclosed herein, however, departs from a typical commercially-available artist performance in that the clip for use with the application may include separate tracks for an instrumental portion of the clip and for a lead vocals portion of the clip. Alternatively, the clip may include separate tracks for an instrumental portion of the clip as well as a background vocals portion of the clip, and a lead vocals portion of the clip. A multi-track clip permits a user to listen to a recognizable professional performance, while recording the user's performance (i.e., content contribution) at the mobile device. For example, a user may optionally listen to a previously recorded performance, such as a commercially available recording, through earphones or headphones at the user mobile device. In this way, the user vocal performance may effectively replace the lead vocals portion of the professional performance for the composite clip, while leaving the remainder of the professional performance intact. As used herein, a “clip” will be understood to refer to a multi-track clip with different performances recorded in different tracks of the clip. For example, one track of the multi-track clip may comprise a professional artist contribution, which will be replaced with a user performance, and a separate track for background audio, vocals, and/or instrumental. As noted above, the separate tracks may comprise multiple user performances, to create harmony performances, instrumental works, lip synching, a capella renditions, and the like. As used herein, a “composite clip” will be understood to refer to a clip in which content such as a user's performance has been combined with a performance of the original clip. For example, depending on the configuration of the original clip, the composite clip may comprise a separate user performance vocal track, instrumental/background vocal track, and user video track.
The composite clip, comprising the combined user performance (audio and video) and the background/instrumental track, can be stored at the local user mobile device, and effects processing may be applied to the user performance track. The effects processing can be used to improve the user performance. In some embodiments, one or more of the effects processing is automatically applied, in real time, as the user performance is recorded. After the recording is completed, the composite performance may be previewed. Additional processing effects may be applied, or extracted, and observed in the preview operation. Once the user is satisfied with the composite performance, the composite clip with combined user performance, instrumental/backing vocals, and video segment, can be sent to a computer device over a computer network for sharing with other users.
The user interface presented for guiding the user through the performance and sharing provides a user experience that is convenient and enjoyable. A typical scenario involves a music server or other source of clips at a first computer device, and a user at a second computer device, such that the clips at the first computer device can be viewed while a user performance is recorded at the second device. In this way, the recorded user performance and the instrumental/background vocals of the original clip can be combined into a composite performance, for sharing with other users.
Thus, the clip tracks, comprising the music data of the selected clip, may include a track of a lead vocal and a track of instrumental and/or other backup vocals. Additional data in the clip may include metadata for clip identification, clip format configuration, and the like, as well as music information such as song lyrics, tone information, pitch level and timing information, timbre information, and the like. The metadata may be stored in a header portion of one or more of the tracks, or the metadata may be stored in parallel with the music data of a track, or may be stored in a combination of the two. The clip may comprise, for example, an enhanced media file such as described in U.S. patent application Ser. No. 13/489,393 entitled “Enhanced Media Recordings and Playback” by Robert D. Taub, et al., filed Jun. 5, 2012. As described further below, the selected clip may comprise a previously submitted composite clip that includes a prior user performance and the selected clip. The previously submitted composite clip may comprise a clip without a lead vocals track.
At the next operation, indicated by the
The user performance to be recorded and combined with the previously recorded tracks of the selected clip, will typically involve both audio and video elements. For example, the user's computer device may comprise a smart phone with a rear-facing camera and a microphone. In this way, the user can record video of the user's performance and audio of the user's performance, at the same time. The selected clip may include instrumental/background vocals of a professional and/or commercially available recording. If the user's computer device has a forward-facing camera and a rear-facing camera, then the user has the option of recording video of the performance that is viewed through the rear-facing camera, which is the usual scenario, or recording the performance that is viewed through the forward-facing camera. When using headphone or earphones, the user will be able to hear the professional performance lead vocals of the clip, but the user's recorded performance will be without the professional lead vocals, effectively replacing the professional lead vocals with the user's performance. The recording of the user's performance is initiated at the user's second computer device in response to a store command or record command or similar command of the application, so recording will not begin until the user is ready.
In the next operation, at the box 130, the user is able to select and preview the effects processing. The application will cause the device, in response to a preview command, to generate a combined performance comprising the recorded user performance and at least one of the one or more tracks of music data. As noted above, the clip tracks of music data that will be recorded and stored at the second device will typically include all the tracks of the clip, except for the lead vocal track of the clip. As noted above, the clip can be obtained with the lead vocal comprising one of the tracks, with other instrumentation and background vocals on one or more other tracks. A variety of effects processing may be employed to the composite clip to produce a new composite clip, which may be stored. Available to the user are audio effects, or video effects, or a combination of both. The effects may comprise, for example, effects such as reverberation, echo, gloss, pitch, harmony, helium, and melting or dissolving effects. Many additional effects may be implemented to modify the tracks of the clip, effects such as muting a backing track, flanger, ring modulation, stereo-panning automation, video filters (e.g., spotlight, sepia, black & white, posterizing, and so forth), telephone audio processing (e.g., reduction of bandwidth permitted for a clip), “bit crusher” (i.e., reduction of dynamic range), stutter, wah wah, tape noise and recording hiss, crowd noise, chorus, shouts, “helium balloon” effects, multi-band compression, tempo-sync effects (e.g., tremolo, auto-pan, filter-sweep), amplification overdrive and distortion, bullhorn, radio, data-driven vocal layering, “ping pong” delay, duets, mashup of tracks and sources, arpeggiator, reverse, format “boost” for a vocalist, and vocoder (e.g., Imogen Heap, “Hide & Seek”). Additional effects available to the user may include converting from color images to black & white images, resolution modification (both higher and lower), clips comprising images and video from a pre-stored library of images and video, multiple screens such as tiles in a window that are presented sequentially or simultaneously, lighting changes, and the like.
As part of the preview processing, the user may audition the recorded user performance for satisfaction, and also may select one or more effects processing to be applied to the user performance. Such operations are indicated in
As noted above, at the box 135, after the user finishes the performance and completes review of the composite clip, the user may audition the recorded user performance for satisfaction, and also may select one or more effects processing to be applied to the user performance. If the user is not satisfied with the user's performance upon viewing the combined tracks from the preview operation, then the user may decide to apply different effects processing, remove effects processing, or make any other adjustments, as desired. A decision to apply additional/different effects processing, an affirmative outcome at the box 135, will result in the application returning to the preview operation at the box 130, after recording and/or applying the effects processing to the recorded user performance.
After the user is satisfied with the user performance, as observed in the preview operation, the user can indicate completion of effects processing, and at the “complete” decision outcome at the decision box 135, the application can store the combined performance (i.e., the composite clip) at the second computer device (i.e., the user mobile computer device). The storing of the composite performance at the user device is typically performed in response to a store command at the second computer device. The combined performance comprises the user performance, audio and video, the backing vocals, any instrumentation, and the like. The combined performance is stored as a single track of audio, with left and right audio channels, and combined with the user's video track, with the effects processing applied. That is, the application responds to the store command by applying the effects processing, combining the processed user track of audio or audio-video, and saving the combined performance to memory of the second computing device. Thus, the combined performance is suitable for uploading to sharing Web sites such as “YouTube” and the like.
The enhanced features disclosed herein, such as audio processing of the user generated performance, may be implemented using enhanced media files, such as described in the aforementioned U.S. patent application Ser. No. 13/489,393 entitled “Enhanced Media Recordings and Playback” by Robert D. Taub, et al., filed Jun. 5, 2012. The processing of the file to produce the enhanced features may be achieved by an enhanced media file application that recognizes the requested effects and is configured to implement the requested effects. The enhanced media file may comprise, for example, album tracks or movie chapters comprising tracks or chapters of a conventional audio or video (multimedia) work, supplemented with enhanced features such as those disclosed herein, including recorded user input, real-time vocal effects, and the like. That is, the conventional audio or video work may be a commercially available recording that is separately available, whereas the present disclosure describes an enhanced version of the commercially available recording, having all the material available on the commercially available recording, and also having the enhanced features disclosed herein.
The enhanced media file that is stored by the system typically comprises an album track that is produced from a number of previously recorded files that define audio tracks or stems. For a conventional audio file, a two-channel left and right track (L/R stereo) file is created from source audio files, from which a master stereo file can be created. This stereo master may comprise, for example, a conventional stereo music file that is commercially available to listeners, such as for programming recorded onto physical media such as CD, DVD, BD recordings or vinyl records, or such as electronic format programming available through online retail sales such as the Web site of Amazon.com, Inc. of Seattle, Wash., USA or such as the “iTunes Music Store” of Apple Inc.
Any number of tracks using various channel layouts according to the file format being used can be encoded into the stereo master. The enhanced file format may be designated by a file suffix that indicates type. For example, the enhanced file format may comprise an “m4a” file format as described in the aforementioned U.S. patent application Ser. No. 13/489,393 entitled “Enhanced Media Recordings and Playback” by Robert D. Taub, et al., filed Jun. 5, 2012. The “m4a” file type may include channel layouts that comprise standard audio channel configurations, multichannel joint-surround encodings, and sequential encodings. The tracks to be encoded may be provided by the user, or by recording artists, media distributors, record labels, sponsors, and the like. Most recorded works are sourced from multiple tracks such as vocal and music (instrumental) tracks. The multiple tracks are mixed down during the mastering process and typically a final two-track (stereo) work is produced. The final work according to the file format can sometimes have a multiple number of tracks that are automatically mixed down by the playback application from the multiple tracks into two-channel (stereo) form for presentation to the listener. For example, the “iOS” platform operating system for mobile devices from Apple Inc. does not currently allow for direct access to individual tracks, but rather utilizes mixed-down stereo samples. Thus, it renders the various channel layouts available to m4a files as useless in the endeavor described herein. As a consequence, the conventional master stereo tracks are placed in their typical position in the enhanced media file as would be expected by a conventional player application for a conventional media file. Additional information, such as m4a metadata tags, are also placed in their typical position in the enhanced media file as would be expected by a conventional player application. This arrangement supports backwards compatibility of the enhanced media file with conventional playback devices.
Thus, the enhanced media file described herein is produced starting with a collection of audio files, a two-channel L/R data file, and a master m4a file that are used for producing the conventional album track. As noted, the tracks of the master m4a file are placed in the enhanced media file in locations corresponding to their typical position in the corresponding conventional commercially available album track.
All of the additional multimedia tracks and feature data that provide the enhanced effects features disclosed herein are appended to the user-data section of the m4a enhanced media file. Because of the additional data for the enhanced features, the enhanced media file is a larger file than would otherwise be needed for the data of a conventional file. The increased file size, however, is necessary for providing the enhanced features, and the additional data is not of a size that creates any significant problem or difficulty for processing by the device. It should be noted that the enhanced features as described herein could also be generated as described even with an operating system that grants full access to individual tracks of songs and movie chapters.
At the next box 140 of device operations for producing the composite performances and sharing using the installed application, the user selects the submit display button. The submit button causes the application to send the combined performance over the computer network to another computing device, such as a device at a sharing site or social media site. At approximately the same time, the application causes the unprocessed user vocal track and the unprocessed user video track to be sent to the application developer's site, along with metadata for song and configuration identification. Saving such unprocessed, or “raw” elements, enables efficient storage of user submissions and enables relatively easy recapture or re-creation of the user's submission, by applying the effects processing to the raw audio and video files.
When users submit their performances, the metadata indicating the effects processing that was applied can be used to collect data that identifies effects processes selected from a plurality of computing devices from which the effects processes are applied to the user performances. In a similar way, data can be collected that identifies clips selected from a plurality of computing devices from which clips are selected.
The recorded clips may include previously submitted composite performances, which may be made available for public viewing and selection for recording. The previously submitted performances, upon selection, may be used to layer additional user performances on top of other performances (vertical layering) or alongside other performances, before or after in time (horizontal layering). Viewing the previously submitted performances and applying effects processing, and previewing the results, can be performed remotely, so that no downloads of performances are necessary. That is, the previously submitted user performances may be viewed, but no copies will be sent to a requesting user, thus avoiding privacy and property rights issues.
The host device 1300 includes a network communications interface 1302 through which the device communicates with a network and/or other users. For example, the interface 1302 may comprise a component for communication over “WiFi” networks, cellular telephone networks, the “Bluetooth” protocol, and the like. A processor 1304 controls operations of the host device. The processor comprises computer processing circuitry and is typically implemented as one or more integrated circuit chips and associated components. The device includes a memory 1306, into which the device operating system, enhanced media file application, user data, and machine-executable program instructions can be stored for execution by the processor. The memory can include firmware, random access memory (RAM), and storage media. A user input component 1308 is the mechanism through which a user can provide controls and data. The user input component can comprise, for example, a touchscreen, a keyboard or numeric pad, vocal input interface, or other input mechanism for providing user control and data input to operate the creation and collaboration application described herein. A display 1310 provides visual (graphic) output display and an audio component 1312 provides audible output for the device 1300. It should be understood that a wide variety of devices are suitable for execution of the creation and collaboration application described herein.
It will be appreciated that many additional processing capabilities are possible, according to the description herein. Further, it should be noted that the methods, systems, and devices discussed above are intended merely to be examples. Various embodiments may omit, substitute, or add various procedures or components as appropriate. For instance, it should be appreciated that, in alternative embodiments, the methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Also, features described with respect to certain embodiments may be combined in various other embodiments. Different aspects and elements of the embodiments may be combined in a similar manner. Also, it should be emphasized that technology evolves and, thus, many of the elements are examples and should not be interpreted to limit the scope of the invention.
Specific details are given in the description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, well-known circuits, processes, algorithms, structures, and techniques have been shown without unnecessary detail in order to avoid obscuring the embodiments. Further, the headings provided herein are intended merely to aid in the clarity of the descriptions of various embodiments, and should not be construed as limiting the scope of the invention or the functionality of any part of the invention. For example, certain methods or components may be implemented as part of other methods or components, even though they are described under different headings.
Also, it is noted that the embodiments may be described as a process which is depicted as a flow diagram or block diagram. Although each may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process may have additional steps not included in the figures.
This application claims the benefit of priority from co-pending U.S. Provisional Patent Application No. 62/022,587 entitled “Clip Creation and Collaboration” to Robert Taub et al. filed Jul. 9, 2014. Priority of the filing date of Jul. 9, 2014 is hereby claimed, and the disclosure of the Provisional patent application is hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62022587 | Jul 2014 | US |