Inexpensive and even free desktop, web-based and mobile movie editing suites let ordinary people commemorate and celebrate vacations, sports seasons, and anniversaries by setting their pictures and videos to music.
Amateur video slideshow producers generally add songs that convey some message or have particularly relevant lyrics. When creating video slideshows, users run into a recurring problem: Songs can often be too long for that format. Long songs make the overall video too long. Or users run out of content to fit the full length of a particular song.
To solve this problem, amateur video slideshow producers generally cut the song wherever convenient and fade out before fading into or starting a new song. This technique, while easy to do, generally produces an amateurish result. More skilled users, who are comfortable with audio editing software, can listen to a song to find sections of the song that sound similar or almost identical one another, find and study waveforms associated with these sections to splice out (e.g., delete) the section of music in between the two splice points. While this is likely within the skill of a professional audio engineer, it is generally beyond the abilities of all but the most skilled of amateurs. In addition, it requires both listening to a song to identify which portions of the song waveform should be studied, and studying the waveforms to identify matches—a process that takes a skilled audio engineer a significant amount of time.
Embodiments described herein include a methods, devices and systems for automatically identifying segments of similar audio content in songs, identifying the best point at which to splice a song, and that then splices the song to adjust the length or duration of the song. For example, portions may be automatically deleted from a song to create a shortened version of the song that retains the beginning, a middle and the end. The song duration may be changed without requiring the song to be listened to.
In some embodiments, a method of altering a duration of a song comprises receiving a request to alter a duration of a first song, the first song being stored as a first digital file; automatically analyzing the first song to determine one or more pairs of splice locations within the first song including comparing one or more portions of the first digital file with a remainder of the first digital file; altering the duration of the first song in response to a selection of at least a first pair of splice locations of the one or more pairs of splice locations.
Automatically analyzing may include comparing a first portion of the first digital file with plural parts of a remaining portion of the first digital file to determine a plurality of similarity values, each similarity value representing a degree of similarity of the first portion with a corresponding part of the remaining portion.
Automatically analyzing may include determining for each of the plurality of similarity of values an associated locational relationship between the first portion and one of the plural parts of the remaining portion of the first digital file
Each of the plurality of similarity values may be determined by an extent of overlap of an area of the first portion of the first digital file with an area of a corresponding one of the plural parts of the remaining portion of the first digital file.
Each of the plurality of similarity values may be determined by performing a cross correlation calculation with the first portion of the first digital file and the remaining portion.
Each of the plurality of similarity values may be determined by determining a shared area of a waveform corresponding to the first portion of the first digital file and waveforms corresponding to a respective one of the plural parts of the remaining portion of the first digital file.
Altering the duration of the first song may include providing a second digital file comprised of the first digital file with a portion of the first digital file between a selected pair of splice locations removed.
Altering the duration of the first song may include providing instructions for playing the first song, including skipping a portion of the first song between a selected pair of splice locations.
Altering the duration of the first song may include lengthening the first song by providing a second digital file comprised of the first digital file with a portion of the first digital file between a selected pair of splice locations duplicated.
Altering the duration of the first song may include providing instructions for playing the first song, including duplicating a portion of the first song between a selected pair of splice locations.
Some disclosed embodiments include receiving a request to alter a duration of a group of songs, including the first song, each song of the group of songs being stored as a digital file; for each of the group of songs, automatically analyzing each song of the group of songs to determine one or more pairs of splice locations within each song including comparing one or more portions of the corresponding digital file with a remainder of the corresponding digital file; altering the duration of the group of songs by altering the duration of at least the first song in response to a selection of the first pair of splice locations within the first song.
Methods may further comprise altering the duration of at least the first song and a second song of the group of songs in response to a selection of at the first pair of splice locations within the first song and in response to a selection of a second pair of splice locations within the second song.
Methods may further comprise receiving a user input reflecting a duration, wherein the selection of the first pair of splice locations within the first song and the selection of the second pair of splice locations within the second song is automatically performed in response to the duration.
The user input may be a duration value or one or more parameters from which the duration is calculated.
Altering the duration of the group of songs may comprise altering the duration of the group of songs to approximate or equal the duration.
Methods may further comprise associating the group of songs with altered duration to one or more of video and image files.
Methods may further comprise creating a slide show using the group of songs with altered duration.
Methods may further comprise creating plural versions of the first song with an altered duration, each version corresponding to a different pair of splice locations of the one or more splice locations; receiving one or more ratings from one or more users of at least one of the plural versions indicating a quality of the at least one of the plural versions.
Methods may further comprise providing a list of the plural versions of the first song with an altered duration, the list being responsive to the one or more ratings of the one or more users.
The order of the list may be responsive to the one or more ratings of the one or more users.
Each entry of the list may include a version indicator associated with a corresponding version of the first song with an altered duration, and a quality value associated with the version indicator. The quality value may be responsive to the one or more ratings of the one or more users.
The user rating may comprise a user selection of one of a positive icon and a negative icon via user interface.
Methods may further comprise altering the duration of the first song by repeating the first song between the pair of splice locations.
Methods may further comprise playing with a phone the first song of altered duration as a ringtone upon receiving a phone call.
Methods may further comprise providing information to display the one or more pairs of splice locations to a user; and receiving a selection of a pair of splice locations from a user.
Methods may further comprise providing display information comprising a timeline representation of the song and a plurality of splice location indicators associated with the timeline representation of the song; and receiving a selection of two of the splice location indicators.
Methods may further comprise receiving a user input responsive to a user manipulation of a cursor to match two of the splice location indicators.
The step of automatically analyzing the first song may be performed prior to receiving the request to alter the first song and wherein altering the duration of the first song in response to the selection is performed without reanalyzing the first song.
Methods may further comprise automatically analyzing a plurality of songs, including the first song, to determine, for each song, at least one version of the corresponding song having an altered duration and storing altered song duration information corresponding to each version; and then receiving the request to alter a duration of the first song; and then providing altered song duration information of the first song.
The altered song duration information may comprise a quality ranking User feedback may be received regarding a version of a song having an altered duration and changing a quality ranking in response to the user feedback.
Automatically analyzing may include comparing a plural portions of the first digital file with plural parts of a remaining portion of the first digital file to determine, for each portion of the first digital file, a plurality of similarity values, each similarity value representing a degree of similarity of the corresponding portion with a corresponding part of the remaining portion.
Methods may further comprise comparing plural pairs of splice locations to determine similar pairs of splice locations. Determining similar pairs of splice locations may comprise determining a segment duration between each pair of splice locations and comparing the determined segment durations.
Methods may also comprise clustering similar pairs of splice locations into a pair of splice objects. Selection of the first pair of splice locations may comprise selecting the pair of splice objects.
Methods may comprise providing a list of pairs of splice locations, and clustering may comprise providing only one of the similar pairs of a splice object in the list.
Embodiments contemplate a non-transitory, tangible, computer readable storage medium comprising a program that when executed by a computer system performs one or more of the methods described herein.
Embodiments contemplate a processor configured to perform one or more of the methods described herein. Embodiments may comprise a computer system configured execute programs stored in the non-transitory, tangible, computer readable storage medium to execute one or more of the methods described herein.
Example embodiments will be more clearly understood from the following brief description taken in conjunction with the accompanying drawings. The accompanying drawings represent non-limiting, example embodiments as described herein.
Various exemplary embodiments will be described more fully hereinafter with reference to the accompanying drawings, in which some exemplary embodiments are shown. The present invention may, however, be embodied in many different forms and should not be construed as limited to the example embodiments set forth herein. These example embodiments are just that—examples—and many implementations and variations are possible that do not require the details provided herein. It should also be emphasized that the disclosure provides details of alternative examples, but such listing of alternatives is not exhaustive. Furthermore, any consistency of detail between various examples should not be interpreted as requiring such detail—it is impracticable to list every possible variation for every feature described herein. The language of the claims should be referenced in determining the requirements of the invention. In the drawings, the sizes and relative sizes of layers and regions may be exaggerated for clarity. Like numerals refer to like elements throughout, and thus repetitive description may be omitted.
When an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being “directly connected” or “directly coupled” to another element, there are no intervening elements present. Other words used to describe the relationship between elements or layers should be interpreted in a like fashion (e.g., “between” versus “directly between,” “adjacent” versus “directly adjacent,” “on” versus “directly on”). As used herein the term “and/or” includes any and all combinations of one or more of the associated listed items.
Although the terms “first”, “second”, etc. may be used herein to describe various elements, components, sections, etc., which should not be limited by these terms. Unless indicated otherwise, these terms are only used to distinguish one element, component, section, etc. from another. Thus, a first element, component, region, or section could be termed a second element, component, region, or section without departing from the teachings of example embodiments.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising”, “includes” and/or “including,” if used herein, specify the presence of stated features, integers, steps, operations, elements and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which example embodiments belong. It will be further understood that terms, such as those defined in commonly-used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
“Splice point” as used herein refers to an identified location within a song (i.e., a time from the beginning of the song or other time marker) that identifies an option to delete or add other portions of the song after that spice point (to provide a song having a different length).
“Paired splice points” or “Grouped splice points” (or similar language, such as “pairs of splice points” or “groups splice points”) refers to two or more splice points relating to each other due to similarity of the song at these splice points. Paired splice points or two splice points of a group may be used to locate portions of a song for splicing.
“Splicing” refers to the process of joining different portions of a song together that may result in a shorter or longer song. The resulting spliced song may take the form of a new digital representation of a song (e.g., a new digital music file representing the altered song). Alternatively, the spliced song may take the form of instructions to play a particular sequence of identified portions of the original song with (e.g., a pointer list to indexed time locations of the song).
“Slide show” refers to a visual presentation, that may comprise the display of one or more still pictures and/or video. The slide show may also comprise background music such as one or more songs.
“Waveform” is the representation of an amplitude of a signal over time. A waveform may be digital or analog.
“Audio Waveform” is a waveform of an audio signal. The audio signal may represent a voltage signal or an acoustic vibration, for example.
“Similar waveforms” refers to waveforms that are determined to meet a certain threshold of likeness. Similar waveforms may be identical waveforms.
Locational terms, such as “location” or “point” with respect to a song or song waveform indicates a location, point, etc. along the x-axis and may represent a time value, such as a time from the beginning of a song.
“Song” refers to a musical composition. A song may include only instrumental music, may include only vocals or may include a combination of vocals and instruments. A song may comprise a musical composition of audio recordings (such as recordings of birds, the ocean, traffic, etc.).
A “computer” refers to one or more apparatus and/or one or more systems that are capable of accepting a structured input, processing the structured input according to prescribed rules, and producing results of the processing as output. Examples of a computer may include: a computer; a stationary and/or portable computer; a computer having a single processor, multiple processors, or multi-core processors, which may operate in parallel and/or not in parallel; a general purpose computer; a supercomputer; a mainframe; a super mini-computer; a mini-computer; a workstation; a micro-computer; a server; a client; an interactive television; a web appliance; a telecommunications device with internet access; a hybrid combination of a computer and an interactive television; a portable computer; a tablet personal computer (PC); a personal digital assistant (PDA); a portable telephone; application-specific hardware to emulate a computer and/or software, such as, for example, a digital signal processor (DSP), a field-programmable gate array (FPGA), an application specific integrated circuit (ASIC), an application specific instruction-set processor (ASIP), a chip, chips, or a chip set; a system on a chip (SoC), or a multiprocessor system-on-chip (MPSoC); an optical computer; a quantum computer; a biological computer; and an apparatus that may accept data, may process data in accordance with one or more stored software programs, may generate results, and typically may include input, output, storage, arithmetic, logic, and control units.
“Software” refers to prescribed rules to operate a computer. Examples of software may include: software; code segments; instructions; applets; pre-compiled code; compiled code; interpreted code; computer programs; and programmed logic.
A “computer-readable medium” refers to any storage device used for storing data accessible by a computer. Examples of a computer-readable medium may include: a magnetic hard disk; a floppy disk; an optical disk, such as a CD-ROM and a DVD; a magnetic tape; a flash removable memory; a memory chip; and/or other types of media that can store machine-readable instructions thereon.
A “computer system” refers to a system having one or more computers, where each computer may include a computer-readable medium embodying software to operate the computer. Examples of a computer system may include: a distributed computer system for processing information via computer systems linked by a network; two or more computer systems connected together via a network for transmitting and/or receiving information between the computer systems; and one or more apparatuses and/or one or more systems that may accept data, may process data in accordance with one or more stored software programs, may generate results, and typically may include input, output, storage, arithmetic, logic, and control units.
A “network” refers to a number of computers and associated devices that may be connected by communication facilities. A network may involve permanent connections such as cables or temporary connections such as those made through telephone or other communication links. A network may further include hard-wired connections (e.g., coaxial cable, twisted pair, optical fiber, waveguides, etc.) and/or wireless connections (e.g., radio frequency waveforms, free-space optical waveforms, acoustic waveforms, etc.). Examples of a network may include: an internet, such as the Internet; an intranet; a local area network (LAN); a wide area network (WAN); and a combination of networks, such as an internet and an intranet. Exemplary networks may operate with any of a number of protocols, such as Internet protocol (IP), asynchronous transfer mode (ATM), and/or synchronous optical network (SONET), user datagram protocol (UDP), IEEE 802.x, etc.
Embodiments described herein include a method, device and system for automatically adjusting a length of a song. Embodiments of device and systems disclosed herein may be configured to implement the methods, and methods of the embodiments may be described with respect to the disclosed device and systems. For example, disclosed embodiments may identify segments of similar audio content in a song, identifying the best point at which to splice a song, and splice the song to adjust the length of the song. For example, portions may be automatically deleted from a song to create a shortened version of the song while still retaining the beginning, a middle portion and the end of the song. By maintaining the beginning and end of the song, a shortened version may avoid annoying cutoffs (e.g., fadeouts) in the middle of the song while still shortening the length of the song. Similarly, a song may be automatically lengthened without noticeable detection by a listener by splicing portions of a song together at automatically detected splice points within the song.
Automatically adjusting the length of a song is useful for many applications. For example, in creating slide shows to share pictures (e.g., for kids' sports functions, family vacations, parties, etc.) a certain amount of time may be appropriate for the slide show (e.g., 5 minutes). In adding background music to the slideshow, songs are often undesirably long for the slide show—a user may wish to add multiple songs to match different themes of the slide show, to remind the listener of some experience relating to the song (e.g., favorite songs of a trip, friends, a team, etc.) or simply to create more musical diversity for the slide show to make the slide show more interesting. However, as modern song lengths (e.g., song lengths of pop, rock, hip hop songs) are typically at least three minutes or more (often in the range of 3 to 5 minutes, many over 5 minutes), using desired songs as background music for a slide show of limited length (e.g., 5 minutes) cannot be done without limiting the lengths of one or more songs. It is often desirable to end the slide show with the original end of the background music (rather than simply fading out the song to end the slide show).
Other applications in adjusting a length of a song according to embodiments of the invention include creating a remix of a song (e.g., a DJ may desire certain songs be shortened or lengthened), matching song length to a commercial, a video segment of a movie, or matching a song length or length of plural songs to an event of a known duration (e.g., a presentation of still and/or video images, such as a slide show).
Embodiments herein are directed to methods, devices and systems automatically identify two or more portions of a song waveform having sufficient similarity and points within these portions (sometimes referred to herein as “paired splice points” or “grouped splice points”) that may be selected to interrupt the original song waveform to delete or insert portions of the song. The embodiments also describe automatically modifying or assisting a user to modify the length of a song using these identified splice points to change the length of the song with little or no audible artifacts evident to the listener. It should be emphasized that many of the embodiments disclosed herein are not mutually exclusive but are useable with one another, in whole or in part—it is impracticable to set forth a separate description to for each possible combination of features of the embodiments described herein, and thus a particular combination of features according to the invention may be described in connection with separate embodiments in this disclosure. Further, devices and systems described herein are contemplated to be configured to perform the methods described herein, and steps of methods may be represented by the portions of this disclosure describing device and system embodiments.
One exemplary method of finding similarities in audio segments from different parts of a song is shown in
In step S34, the comparison (e.g., similarity analysis) results are used to automatically determine the existence and location of corresponding splice points within the song waveform. For example, when the audio waveform segment and the song waveform are highly similar, there will be significant overlap and a resulting peak of the cross correlation calculation results will be obtained, corresponding to the location of the overlap of the audio waveform segment and the song waveform. The size of this peak may be used to determine an amount of similarity. If the size of this peak is large enough (e.g., over a certain fixed or calculated threshold) it may be determined that a match exists between the audio waveform segment and the song waveform. When a match exists, the location of this peak in the cross correlation calculation results indicates where the segment and the song match to derive a pair of corresponding “splice points” (sometimes referred to herein as “paired splice points” or “grouped splice points”). The cross correlation calculation results may have multiple peaks that may reveal multiple similarity matches with the audio waveform segment and the portions of the song waveform, and thus the similarity analysis may detect grouped splice points comprising more than two splice points.
In step S36, the song may be shortened or lengthened using the automatically determined splice points. For example, the portion of a song waveform extending between paired splice points (or splice points of the same group) may be removed to shorten the song. Alternatively, the song may be lengthened by (for splice points of the same group) replacing the song waveform occurring after a later splice point with the portion of a song after an earlier splice point (thus repeating a section of the song to the later splice point before playing the song after the later splice point). The modification of the song waveform may be performed by creating a new digital music file (with segments of data corresponding to the song waveform being modified accordingly) or by creating a set of instructions to sequentially play identified portions of the original song. The resulting song of adjusted length may include the same beginning and ending of the original song.
As shown in
Next, audio waveform segment W1 is aligned and compared with the portion of the song waveform 60 from where it was extracted (i.e., perform a similarity analysis) to determine how similar W1 is with that same length section of the song waveform 60. As audio waveform segment W1 was extracted from the end of the song waveform with the end of the song, audio waveform segment W1 is initially aligned with the end of the song waveform 60. For this first comparison (in performing a similarity analysis), they are identically similar, since W1 is cut from that section of song.
Next, as represented by
The cross correlation similarity analysis described with respect to
where f and g represent discrete functions of the appropriate channels of audio waveform segment W1 and the song waveform 60 and f* represents the complex conjugate of the function f.
As shown in
The results of the initial comparison of waveform segment W1 with itself as embedded in the song waveform 60 (exemplified by
As shown in
Splice points may be determined by analyzing peaks resulting from the comparison of the audio waveform segments Wi to the song waveform 60. In some embodiments, the splice points are selected as those corresponding to the highest peak magnitudes (e.g., the top 20) of all resulting peaks of all audio waveform segment comparisons (here, of all peaks resulting from the comparisons of audio waveform segments W1 to Wn to the song waveform 60). Alternatively, the splice points are selected, for each audio waveform segment comparison (e.g., cross correlation calculation) as those corresponding to the highest peaks resulting from that comparison. Other selection criteria may be employed, such as selection criteria based on quality analysis (e.g., peak comparison) as well as to provide a variety of song versions of different lengths (e.g., if a large group (e.g., 10 or 20) high quality splice point pairs result in a similar song length (or resulting in the same portion of the song being spliced), a lower quality splice point pair may be selected to replace one of higher quality splice point pairs.
Note different groups of splice points provide different options for splicing the song waveform 60—i.e., interrupting the original song waveform at a splice point of the nth group, should be followed by the original song starting at another splice point of the same nth group. In the example shown in
Other sequences of similarity analyses than that described above may be performed between audio waveform segments Wi and the song waveform 60. For example, extraction of the audio waveform segments may start from the beginning of the song. As another example, cross correlation calculations may start with a comparison with a portion of the beginning of the song. It may be desirable that after all audio segments Wi have been extracted, that all segments have been compared with each other, whether as an extracted audio waveform segment Wi or embedded as part of the song waveform 60. It may be unnecessary that later extracted audio waveform segments be compared to the entire song waveform 60. For example, it is unnecessary to perform a cross correlation of audio waveform segment W2 starting at the end of the song waveform 60 as this comparison was previously made when audio waveform segment W1 was compared to the portion of song waveform 60 including W2 (the portion from which audio waveform W2 was extracted).
As discussed, after determining a match between an audio waveform segment Wi and a corresponding portion of the song waveform 60, splice points are associated with each other to form grouped splice points (and may form plural groups of splice points) to provide options for altering the length of the song. The exact location of the splice points within the audio waveform segment Wi and the corresponding portion of the song waveform 60 may be selected in many ways. One way is to simply choose the earliest time location of the audio waveform segment Wi and a corresponding matching point of the song waveform 60 (aligned to the audio waveform segment Wi as determined by the offset of the peak resulting from the cross correlation calculations). Pyramid processing may be performed to compare subdivided portions of the audio waveform to the corresponding portions of the song waveform 60. If only some of these subdivided portions are shown to match, the splice point pair may be adjusted so that they start at or fall within matching portions of the audio waveform segment Wi and the song waveform 60. Pyramid processing is described further below.
The songs that may be used with this method are often represented by digitally sampled waveforms. The music may be captured at a defined sample rate—44,100 samples per second is one popular sampling rate. As noted, the audio waveform segments (e.g., W1) may be slid across the entire song waveform 60 to compare audio waveform segment to the song waveform 60. When the song waveform 60 is represented in digital format, the sliding of the audio waveform segment may be performed with a step size of one sample, and perform a similarity analysis at each of these step (separated by a sample width). This may be computationally very expensive. One way to reduce calculations is to perform similarity analysis between decimated versions of the audio waveform segments Wi and the song waveform 60 (keeping every Mth sample), or otherwise using only every Mth sample of the audio waveform segments Wi and the song waveform 60 in comparing the waveforms. M may be an integer between 5 and 10, e.g.
In another example, rather than compare an audio waveform segment W to the entire song waveform 60 in one calculation, each audio waveform segment W may be “stepped” across the song waveform 60 for comparison to only a portion of the song waveform 60 at a time. Each portion of the song waveform 60 is then compared to an audio waveform segment W in a separate cross correlation calculation, which may reduce the complexity of each of these cross correlation calculations (e.g., for each audio waveform segment Wi, a cross correlation calculation may be performed for each of the multiple portions of the song waveform 60). The step size (or size of the portions of divided song waveform 60) may be equal to or more than 2 times the size of Wi or equal to or less than 5 times the size of W1. One could create step sizes that are larger—perhaps one or two seconds, or even the same size as the sample waveform, Wi. For example, one may use a step size that is one half the width of the sample window (the audio waveform segment), Wi. When these portions of the song waveform 60 and audio waveform segments Wi are at least five seconds in length, it has been found that similarity analyses results produce splice points that better reflect similarities at a chorus or refrain level of the song, making length changes to the song less detectable. Splice points that better reflect similarities at a chorus or refrain level of the song may be better discriminated by when at least one of the portions of the song waveform 60 and the audio waveform segments Wi is 10 seconds or more and the other is at least 5 seconds or more. When the audio waveform segment Wi and the these portions of the song waveform 60 are smaller (e.g., less than two second in length), similarity analyses may create splice point pairs that result from similarities in background music, which may be less desirable splice point pairs. Thus, it may be desirable in some implementations to keep the size of the audio waveform segment Wi and the portions of the song waveform 60 greater than 2 seconds.
Other sequences of similarity analyses may be performed between audio waveform segments Wi and the song waveform 60. It is desirable that after all audio segments Wi have been extracted, that all segments have been compared with each other, whether as an extracted audio waveform segment W or embedded as part of the song waveform 60.
In the examples discussed herein, cross correlation is used in performing the similarity analyses of the audio waveform segment Wi and the song waveform 60. The cross-correlation of two vectors may be the product of their respective Fourier transforms, with one of the transforms conjugated. However, alternative embodiments contemplate other ways to measure the similarity between an audio segment and the song, ranging from time-domain methods, such as subtracting samples between Wi and the song waveform to find minimum measure, to performing frequency domain analysis using Fourier methods to find similar signal matches. The system and software for adjusting song length may implement various methods of similarity analysis such as time domain/cross correlation methods and/or frequency domain/Fourier analysis.
Once point(s) of similarity have been identified and designated as optional splice points, the software may then splice the song, to make it shorter (or longer, as desired).
To make the song longer, rather than splice out a portion of the song, a portion of the song may be repeated. For example, rather than deleting a portion of the song between two identified splice points, we may add the song at the earlier splice point to the song at the later splice point, thereby repeating the section between the two splice points. This may be done one or more times to repeat the section one or more times. In this example, the song may remain unchanged at portions corresponding to the identified splice point locations for certain portions of the resulting song.
A successful cross correlation match may not guarantee that the waveform will be sufficiently similar across the entire audio waveform segment Wi. For example, if two thirds of the audio waveform segment Wi match the song waveform 60, a match may be detected between Wi and a portion of song waveform 60, even if a third of the audio waveform Wi and the corresponding portion of the song waveform 60 do not match. Using resulting splice points for splicing the original song might produce an undesirable result. Additional processing may be done to adjust the splice point to a point in the audio that is sufficiently similar. One method is to perform pyramid processing. For example, after detecting a match in the initial similarity analysis between segment Wi and the song waveform 60, the audio waveform segment Wi may be subdivided into smaller segments and similarity calculations may be performed on each of these sub-segments. Referring to
In
Even more processing may be performed to find the most precise match that is possible. As shown in
There are a number of ways to determine how to modify the length of the songs based on the automatically detected splice points. The embodiments below describe shortening one or more songs, but they are equally applicable to lengthening one or more songs (or a mixture of lengthening one or more songs and lengthening one or more songs). The shortened songs may be spliced from the original song while maintaining the beginning and end of the original song. In one embodiment, song lengths are shortened (e.g., automatically without being based on input from a particular user) to a length typically desirable for use in a slide show. For example, the original song may be shortened by one to two minutes, more than two minutes, or to one or more versions that have lengths that fall within a range of 40 seconds to 90 seconds, or that are less than 75% of the original length or are less than 50% of the original length. These shortened songs may include the beginning and end of the original song. A user may be presented with a list of the songs and their time to select songs. The shortened songs may be added as background music to a slide show or other audio visual presentation.
In another example, the user may submit a song for shortening utilizing one of the embodiments described herein. The user may receive a shortened song that is determined to provide the least ability to detect a splice of the original song, e.g., based on similarity analysis described herein. Such similarity analysis of various splice options may review peaks resulting from previously cross correlation calculations and select splice points corresponding to the maximum peak. Alternatively, the splice points may be chosen based upon clustering of splice points, indicating a significant length of similarity within the song.
In another example, the user may submit a song for shortening utilizing one of the embodiments described herein. The initial shortened song may be selected from paired splice points that result in a shortened song having the least time removed (as compared to other options for song shortening using the groups of splice points). The user may hit a button (or otherwise request) a next shorter song, resulting in a shortened song of the next shortest length among the options for a shortened song. Alternatively, the user may first be presented with the shortest shortened song and each subsequent request may result in the next shortest song (e.g, after the shortest song, the user may receive the second shortest shortened song among the options for a shortened song based on the determined splice points).
Alternatively, the user may be shown a song waveform with the groups of splice points indicated by labeled (“splice handles”). For example, the splice handles may be similar to those represented in
In a further example, independently calculated splice point pairs of a song are analyzed for their similarity in differences in time in the song. The method “clusters” valid splice points (or clusters splice points determined to have sufficient quality). For example, adjacent splice points that have similar splice features may be clustered or grouped into a single splice object. Users can then interact with that single splice object rather than a single splice point. The splice object may be used equivalently to a splice point of a splice point pair, and be selected with a corresponding splice point object to shorten or lengthen a song. As part of clustering, a single one of the splice points of a clustered spliced object may be provided to a user and/or used in a splicing the song in a manner described herein.
Clustering of similar splice points becomes apparent when valid (possible) splice point pairs are mapped. Valid splice point pairs refer to splice point pairs determined to meet some quality criteria (e.g., due to evaluation of a comparison of a segment to a remaining portion of a song, as discussed elsewhere herein). The graph of
As shown in
In one version of splice point clustering, the user can manipulate clustered splices (e.g., one of 62a, 62b or 62c in
In another version, the user is presented with a list of pairs of clustered splices points, selection of a pair of clustered splice points resulting in splicing of the song using one of the pairs of splice points (e.g., A and A″ in
In some embodiments, a user may select a portion of the song waveform (or song lyrics) for which they would like to find a match, or may select a portion of the song waveform (or song lyrics) they would like to keep or may select a portion of the song waveform (or song lyrics) they would like to have deleted. The software will seek to find similarities to that user selected segment and/or keep such user selected segment. The words of the song may be displayed and respectively associated with appropriate portions of the song waveform to assist the user in this selection, as described above.
In some embodiment, a user may select a desired final song time and embodiments herein will analyze the various options to shorten the song with the grouped splice points and provide the song closest in length to the desired final song time. In some examples, the song may be slightly increased or decreased (e.g., plus or minus 10%) in speed to accommodate a desired time. In some examples, a detected fade-out of the end of the song may be accelerated to accommodate a desired time.
In some embodiments, a length of a presentation (or portion of a presentation) may be submitted or otherwise determined by various factors, and the desired final song(s) time may be used to adjust the length(s) of song(s) to fit the length of the presentation. For example, the user may select a time TS per slide (e.g. three seconds per slide) and provide a number N of slides. The length of one or more selected songs may be adjusted to approximate the overall time of the slide show (N×TS). After adjusting the length of the songs, the total song length may not exactly match (N×TS) due to splice point restrictions (or other criteria, such as user input criteria to maintain a minimum song length or to maintain certain lyrics). To provide an exact match between the lengths of the selected one or more songs, after shortening these songs, the difference between the total length of the songs may be calculated and the time TS per slide may be adjusted to TS′ so that N×TS′ equals the total length of the songs. Alternatively, the time per slide need not be the same for each slide. In addition, video clips may by themselves establish a presentation length, or may be inserted in a slide show to affect the time of the overall presentation (e.g., the presentation length may be N×TS+TV, where TV equals the length of the video clips of the presentation), which may be used to adjust length(s) of song(s). Similarly, after length adjustment of the songs to be close to the desired time of presentation, the time per slide may be adjusted to TS′ so that N×TS′+TV=the total time of the song lengths.
In some embodiments, the desired time over multiple songs may be provided or selected by the user. For example, users may wish to create a slideshow that consists of multiple identified songs. The user may provide the songs and or the identity of the songs to a computer system (e.g., a web server). The user can select a desired target length of the video and/or slideshow and the computer system will optimize length of multiple songs to attempt to achieve that target. For example, each of the multiple songs may be analyzed to determine optimum or valid splice points and determine a resulting song length if using that results if using the splice points to shorten (or lengthen) the song. This may be done multiple times for each song, resulting in multiple song lengths for each song. Then, each version of the song may be matched with other versions to provide options for modifying the song length of the set of songs. The computer may determine the set of versions (or provide several sets of versions) that best matches the desired set length. Alternatively, the computer system may provide sets of versions of songs and corresponding set lengths to a user who may then select a desired set (e.g., corresponding to a desired song set length of time).
In some embodiments, songs may be adjusted to provide a shortened, or repetitive portion of a song to create a ringtone for a phone. In some examples, the method may include analysis of the song for similarity matches and then finds the best similarity matches that correspond to a chorus, which is commonly referred to as the “hook” in a song. The ringtone detector may enable a telephonic device such as a smart phone to begin playing the audio at the detected beginning of the hook, and play it until the end before looping back and replaying until the call has been answered or has timed out. The ringtone playback application may use the original song digital file stored in a music file library by providing a set of instructions (e.g., a pointer list to locations within the song digital file) to play the desired portion of the song and repeat the same. Ringtones may also include two or more discrete segments of a song (e.g., with a section of the song therebetween skipped or deleted).
Although
For example, the computer 40 may include a song in the user's music files 44 stored in computer readable medium 42. The computer 40 may transmit the identity of the song (as a result of a user selection) to the server 52. The song length adjustment application on server 52 may access the song from its own music files (or purchase the song if it does not exist in the music files) and determine one or more groups splice points based on the methodology described herein. All or some of the groups of splice points may be transmitted to computer 40, which may then use the groups of splice points to adjust the length of the song (such as disclosed herein). The groups of splice points of the song may be stored with server 52 so that similarity analyses described herein need not be repeated at a later time when the server 52 receives instructions regarding adjustment of the length of the same song—the same groups of splice points previously calculated may be used to adjust the length of the song for a different user, who may have different criteria for altering the length of the song.
As shown in
In another example, database 110 may include record entries for user ratings of the shortened versions of the song. As shown in
It should be emphasized that many of the embodiments disclosed herein are not mutually exclusive but are useable with one another, in whole or in part—it is impracticable to set forth a separate description to for each possible combination of features of the embodiments described herein, and thus a particular combination of features according to the invention may be described in connection with separate embodiments in this disclosure. While example embodiments have been particularly shown and described, it will be understood by one of ordinary skill in the art that variations in form and detail may be made therein without departing from the spirit and scope of the attached claims.
This application claims priority to and is a non-provisional of provisional application Ser. No. 61/903,960, filed Nov. 14, 2013, the entire contents of which are incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
61903960 | Nov 2013 | US |