The present disclosure is generally related to digital music experiences and more particularly is related to systems and methods for converting music into segmented digital assets for dynamic uses in digital experiences.
Music is an integral part of most entertainment experiences, and especially digital entertainment experiences. Within films, electronic games, or other interactive entertainment experiences, music is often used as a soundtrack in order to enhance the entertainment experience for the user. For example, within electronic games specifically, such as video games or computer games, a music-based soundtrack is often used to highlight the experience of the game, such as by using fast paced and powerful music to amplify intense sequences of the game, or by providing calming, relaxed music to de-emphasize less intense sequences of the game.
With these electronic games, the soundtrack used is almost always created originally, from scratch, by musicians and artists who are hired by the game developers. The game developers provide the musicians with instructions or direction on the specific type or style of music, or characteristics of the music that they're seeking for the game, and the musician composes new music for the game developer's consideration. The music generated is specifically composed to fit within one or more sequences of a game. The game developer can then select the newly-composed music to be correlated to specific parts of the game, for example, by hardcoding the music into the game at a particular timeframe or to correspond to a particular setting in the game, such that the music plays at the desired timeframe or setting.
Naturally, composing new, original music for each new electronic game is a time-consuming endeavor which often adds substantially to the costs of producing a game. While it would be convenient to use music that is already available, such as tracks or compositions recorded by well-known artists, this type of music is not used in electronic games for various reasons. For one, the expense of licensing the music from the artist or music label is often prohibitive. Additionally, this type of music cannot be easily added to a game without substantial modifications and formatting to the song itself in order to tailor the song to the part or parts of the game.
Thus, a heretofore unaddressed need exists in the industry to address the aforementioned deficiencies and inadequacies.
Embodiments of the present disclosure provide a system and method for converting music into segmented digital assets for dynamic uses in a digital entertainment experience. Briefly described, in architecture, one embodiment of the method, among others, can be broadly summarized by the following steps: uploading at least one digital music file to a computerized system, the at least one digital music file having at least one song; processing, with a computerized processor of the computerized system, the at least one digital music file to identify segmented portions of the at least one song; separating the at least one song into the segmented portions; analyzing the segmented portions of the at least one song to identify at least one musical quality of each of the segmented portions, wherein an emotion, style, or vibe attribute is correlated to each of the segmented portions based on the at least one musical quality identified; and constructing a composite soundtrack from the segmented portions by arranging at least a portion of the segmented portions based on the emotion, style, or vibe attribute correlated with each of the segmented portions.
The present disclosure can also be viewed as providing systems or methods of providing a composite soundtrack for use in a digital entertainment experience. In this regard, one embodiment of such a method, among others, can be broadly summarized by the following steps: providing a computerized system having a database storing a plurality of digital music files' time-index attributes, each of the plurality of digital music files having at least one song; selecting, by a user, one or more songs to be used to generate a soundtrack for the digital entertainment experience; mapping a plurality of events within the digital entertainment experience to a plurality of emotion, style, or vibe attributes, respectively; constructing a composite soundtrack from segmented portions of the one or more songs, the segmented portions being selected based on at least one musical quality therein, wherein at least one of the segmented portions is correlated to each of the plurality of emotion, style, or vibe attributes to form the composite soundtrack from an arrangement of the segmented portions; and outputting the composite soundtrack to be played within the digital entertainment experience.
Optionally, in one example, playing of the soundtrack may connects the user back to the original song form which a segment of the song, such as a segment currently playing, came from.
The present disclosure can also be viewed as providing systems or methods of converting music into segmented digital assets for dynamic uses in a digital entertainment experience. In this regard, one embodiment of such a method, among others, can be broadly summarized by the following steps: processing, with a computerized processor of the computerized system, at least a first digital music file or a stream of the first digital music file to identify segmented portions of at least one song within the first digital music file; separating the at least one song into the segmented portions; analyzing the segmented portions of the at least one song to identify at least one musical quality of each of the segmented portions, wherein an emotion, style, or vibe attribute is correlated to each of the segmented portions based on the at least one musical quality identified; processing, with the computerized processor of the computerized system, at least a second digital music file or a stream of the second digital music file to identify segmented portions of at least one song within the second digital music file which matches an emotion, style, or vibe attribute of at least one segmented portion of the first digital music file; and constructing a composite soundtrack from the segmented portions of the first and second digital music files by arranging the segmented portions based on the emotion, style, or vibe attribute correlated with each of the segmented portions.
Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims.
Many aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
To improve over the shortcomings, the present disclosure is directed to systems and methods for converting music into segmented digital assets for dynamic uses in digital experiences, such as computer games and video games. As will be described, the systems and methods of the present disclosure effectively allow for any existing song, track, or other musical composition, such as songs or tracks produced by any mainstream or independent, unsigned, unrepresented artist, to be used to build a composite soundtrack for an entertainment experience, which may be referred to herein as ‘dynamic soundtracking.’ In one example, this dynamic soundtrack is a replacement for conventionally composed soundtracks for digital entertainment. In another example, the dynamic soundtrack is not merely a replacement soundtrack for conventional soundtracks in digital entertainment, but rather, it is a dynamic composition made from segments which are dynamically formed into a composite soundtrack which can be specifically tailored, either manually or automatically, to the events within a digital entertainment experience.
It is noted that the subject disclosure can be used with any digital entertainment experience, including interactive entertainment, such as video or computer games, e.g., Xbox, PlayStation, Nintendo, downloadable computer games for PC or IOS, streaming computer games, or any combination thereof, as well as non-interactive visual entertainment, such as films, movies, videos, shorts, clips, etc. For clarity in disclosure, the subject invention is described herein relative to the gaming industry, namely with computer or video games since it offers substantial benefits to the gaming industry where soundtracks are used to accompany the visual interaction a user experiences with the events in a video game or computer game. However, it is not limited to only this use, as the present disclosure can be used with any digital entertainment experience including but not limited to dynamic soundtracking a person having a physical fitness experience (with a device such as a FITBIT® or smartphone aware of the user's movements and intensity), a digital advertisement such that the composite soundtrack adapts to the viewer, and/or online streaming such that the viewers can select their own soundtrack to play during the viewing. Any other digital experience not specifically mentioned herein is also considered within the scope of the present disclosure.
As a general overview of the subject disclosure,
The user interface (UI) 20 includes a song uploading tool which is configured to receive the digital file upload, or porting-in of a third-party hosted music stream, of a song or musical work 22 from a user, which allows for any current or former musical artist to submit his or her musical work 22 for use as a soundtrack. The musical work 22 may include individual songs, full albums, collections of musical works, or any other type of musical compositions, in any genre, style, or digital format. The musical artist may be any artist or representative thereof, such as well-known or popular musical acts of the present or past, or their labels or rights holders, indie artists, underground artists, independent artists, etc., regardless of their success, popularity, or fame. The musical work 22 may be uploaded or otherwise communicated from the user's computing device or database through the song uploading and stream-linking interface 24 in the UI 20 with any known computing technique, whereby a digital audio file, e.g., M4A, MP3, MP4, ACC, FLAC, WAV, WMA, stream, etc., is received within a server or other computerized system of the subject invention. As an alternative to an upload of the digital musical file, the upload interface 24 may allow for pointing to the digital music file in another database, e.g., where metadata can point to existing stream locations such as SPOTIFY®, such that the digital musical file itself is not uploaded in the system.
Once the musical work is uploaded or streamed to the server 12, the processor 30 processes the musical work 22 to identify portions of the musical work 22 which can be segmented and assigned time-index attributes, and then separates those segmented portions 32 from one another. This process of segmenting allows for the entirety or a portion of the musical work to be separated or parsed into one or more smaller portions based on musical differences between the segments 32. For example, a part of a song with a relatively soft, melodic clean guitar picking may be segmented from a chorus of the song, which is faster, louder, and has a distorted guitar sound. Naturally, there may be a large number of ways to segment a given song, including by part of the song (intro, verse, chorus, bridge, outro, etc.), by musical consistency within the segment such as that musical consistency is substantially different than in neighboring segments (musical compositional themes), multi-variant-derived intensity such as via onset density and variation thereto, specific or composite dynamic ranges imputing a level, slope up, or slope down of singular or composite features, by volume, beat, tone, tempo, or timbre, by the presence or absence of vocals, by the presence or absence of certain instruments, by the key of the song, by the chords or chord progression of a song, a natural starting or stopping point of a portion of the song (loopability), or any other characteristic of the song, any combination of which may be used.
Once identified, the processor 30 then separates each of these segmented portions 32 from one another, such that there is an identifiable beginning and end to each segment. For example,
The segmented portions 32 are analyzed to identify it's musical quality, e.g., the character or feel of the segment 32, such that it can be categorized with one or more particular emotion, style, or vibe attributes, i.e., perceived or anticipated emotional reactions, moods, or perceived atmospheres that a human would have when hearing the segment. Continuing with the previous example, the soft, melodic verse with clean guitar picking may be identified to have a low intensity musical quality which correlates to a happy emotion, style, or vibe attribute, whereas the louder, faster chorus with distorted guitar may be characterized as having a higher intensity musical quality which correlates to an angry emotion, style, or vibe attribute. Similar qualities can be correlated to a style or vibe of a particular song.
There may be many types of musical qualities which can be identified with a segment, such as, for example, low intensity or low energy, high intensity, or high energy, high or low stress, an intensity slope that increases, decreases, or is constant, emotion state, emotion vector, music complexity consistency, dynamic music consistency, etc. Similarly, there may be many types of identifiable emotion, style, or vibe attributes, and there are many ways to correlate a particular emotion, style, or vibe attribute with the musical quality. For instance, emotion attributes may include angry, content, happy, sad, relaxed, stressed, nervous, confident, scared, brave, etc., among many others, whereas style attributes may include aggressive, soft, playful, and grating, among many others, and vibe attributes may include relaxed, intense, chill, or anxious among many others. Collectively or individually, these emotion, style, or vibe attributes may be correlated with the musical quality in various ways. For example, musical qualities of high energy, high stress may be correlated with the emotion, style, or vibe attribute of angry, whereas high energy, low stress may be correlated with being happy, and low energy, low stress may be correlated with relaxed, whereas low energy, high stress may be correlated with sad. Each segmented portion may be identified with a musical quality such that each segment can be assigned an emotion, style, or vibe attribute.
Once the segments 32 of the musical work 22 are matched with the identified musical qualities and the emotion, style, or vibe attribute, the segments 32 may be combined or arranged into a composite soundtrack 42. It is preferable to eliminate gaps within the soundtrack, i.e., the inadvertent absence of music, such that the segments 32 are stringed together to form a cohesive composite soundtrack. Further, the “loop ability” and “loop non-repetitiveness” can be analyzed and support the overall quality of the arranged soundtrack element. This composite soundtrack 42 may utilize some, all, or any number of the identified segments from the musical work 22, or in many situations, it may use segments 32 from a plurality of different musical works 22 which are arranged together into a soundtrack. The arrangement of the composite soundtrack 42 may be based on various criteria, but commonly, it may be arranged such that the musical qualities and/or the emotion, style, or vibe attributes of the segments 32 correspond to or match events within a particular digital entertainment experience.
As an example of a composite soundtrack formed from a plurality of songs, a computerized processor of a computerized system may process at least a first digital music file or a stream of the first digital music file to identify segmented portions of at least one song within the first digital music file. The at least one song may be separated into the segmented portions. The segmented portions of the at least one song are analyzed to identify at least one musical quality of each of the segmented portions, wherein an emotion, style, or vibe attribute is correlated to each of the segmented portions based on the at least one musical quality identified. Then, the computerized processor of the computerized system processes at least a second digital music file or a stream of the second digital music file to identify segmented portions of at least one song within the second digital music file which matches an emotion, style, or vibe attribute of at least one segmented portion of the first digital music file. A composite soundtrack may then be constructed from the segmented portions of the first and second digital music files by arranging the segmented portions based on the emotion, style, or vibe attribute correlated with each of the segmented portions. This process may be iterative with any number of additional songs, where the system iteratively searches for additional digital music files which have segments which match the emotion, style, or vibe attribute of at least one segmented portion of the first or second digital music files, such that the composite soundtrack can be constructed from segmented portions of the first, second, and additional digital music files. This allows for a situation where a user, or the system itself, can explore and easily discover lots of segments that can work with another segment to achieve enough music to fill a soundtrack from a single segment, e.g., a “very interesting” segment, that itself may be too short to comprise a soundtrack alone.
As an illustration, the backing track with low intensity may be played when a video game player is in the midst of a non-eventful game play, e.g., travel from one location to another. When the player encounters an enemy and battles them, the backing track with high intensity may be played. If the player is victorious, the victory moment sub-arrangement 44 may be played, and if the player is defeated, the death moment sub-arrangement 44 may be played. Thus, as can be seen, each game or other entertainment experience can be described by events or reasons to play a different soundtrack element, which effectively acts as a map to the use of a portion of the composite soundtrack 42. Keying or correlating the events of the video game to the specific quality or emotion of the sub-arrangement 44, allows the composite soundtrack 42 to be used throughout the game for all identified events, such that the player of the game hears the corresponding part of the composite soundtrack 42 when he or she encounters a particular location, time period, or interaction within the game. Thus, the system 10 is able to generate and output a composite soundtrack 42 which is correlated to, or responds to, the changing context events of the game being played as it is played.
When the composite soundtrack 42 is generated, it may be output from the system 10 in various methods. For instance, the composite soundtrack 42 may be downloadable by a developer and integrated into a game or other form of entertainment. It may also be downloadable by a user of the game, such that the user can select which soundtrack he or she prefers to have with the game. The composite soundtrack 42 may also be communicated with a query API, such that the composite soundtrack 42 can be directly streamed into any network-enabled game. Within the system 10, it may be desirable to use a soundtrack library which has one or more databases 40 which store previously-composed soundtracks and/or released soundtracks. This may allow developers or users to search for particular composite soundtracks 42.
In some situations, it may be desirable for the composite soundtrack 42 to be implemented as a static soundtrack, whereby the identified segments 32 of a musical work 22 are played when a corresponding event in the video game occurs, and the same segments 32 are played for their corresponding game events, respectively. However, the system is also capable of generating dynamic composite soundtracks. As each of the musical works 22 have the individual segments 32 thereof identified based on musical quality and emotion, style, or vibe attribute, and because the events of the game are also correlated or matched to a desired musical quality of emotion, style, or vibe attribute, it is possible to dynamically interchange virtually any compatible musical work 22 as a composite soundtrack 42 for a game. Thus, a user may enjoy the ‘swapability’ the system offers, where they can play the same video game, and the same events within that video game, with a new soundtrack time. In addition, the ability to stream a song from a third party streaming service direct from the experience can be supported.
The game developer or the user may have near unlimited variations in the soundtrack of the game, which in turn, can significantly affect the feel of game play. For instance, a player of a combat video game may be able to play a particular combat scene with classical music at one point in time, heavy metal music at a different time, or hip-hop music at yet another time, each of which can provide a different experience to the user. Additionally, the composite soundtrack 42 generated by the system may be implemented as a package or pack which includes the sub-arrangements 44 along with the original search criteria for contextual song replacement downstream. This can be implemented as “wrapper” code around a “compiled” package of music, or it can be implemented as a “wrapper” code that calls the stream engine real-time. This allows for the composite soundtracks 42 to be stored remote from the game itself but streamed to the local computing device on which a game is played.
It is noted that the composite soundtrack as played in each user session is saved and can be replayed, such that one could, as an example, play a video game and really enjoy the soundtrack experience, then play that soundtrack back while running or exercising to relive the gameplay as inspiration for their workout.
The disclosed system and method offer numerous benefits within the music industry and within the digital entertainment industry. At a base level, the system and method allow digital entertainment developers to move away from the conventional methods of composing soundtracks, e.g., where an artist is hired to compose a soundtrack from scratch specifically for the entertainment experience. Instead, these developers can use virtually any existing song from any artist, which saves the time and expense of composing new and original soundtracks. For the musician, the system and method allow for existing artist and songs to be utilized and integrated into more digital entertainment, which expands both the exposure of the artist or song, as well as his or her earning capacity based on additional licensing fees from the use of his or her music.
Moreover, since the system and method allow for the dynamic interchange of any compatible musical work into digital entertainment, the soundtrack for any particular game or video can be repeatably changed and refreshed such that the same gameplay can be used with different soundtracks to make for different playing experiences. This allows developers to keep their products enticing to existing customers for longer periods of time, and to attract new users for their products. For example, it may be possible for game players to “point” their game to music they like that already exists and the system 10 can restructure that song into a composite soundtrack to their gameplay real-time.
Another benefit of the disclosed system and method is it may allow existing game players to be exposed to new music. As a player is playing a game and hears a soundtrack that they enjoy, the player may be able to use the system 10 to identify the underlying artist or song used to generate that soundtrack. Accordingly, the system 10 may have a link or similar feature which links or electronically connects the player or user back to the artist, the original song, and/or a position within that original song which was played in the soundtrack. The player may then be able to listen to the song or purchase a copy of it, either through the system 10 directly or within a third-party database or application.
Segmentation:
As previously noted, segmentation of the musical work may occur using various techniques. For instance, segmentation may include a comparison of the similarity of musical features, predominantly in the spectral domain. A simple feature may be the energy contained in different frequency bands, and one representative example of these types of features is Mel Frequency Cepstral Coefficients (MFCCs). These features may be understood as a vector of numbers generated approximately every 10 milliseconds, which are representative of the timbral characteristics of the audio signal within the musical work. For a song having a given length, similarities in the long string of vectors which form the song are analyzed, such as by comparison to a distance metric or similar metric which can be plotted in a matrix that has the time offset of one feature vector i as compared to the time offset of another feature vector j, where i & j are offset counters using a distance function. This distance function could be Euclidean distance or some other suitable function. If the two vectors are similar, a low value is seen, whereas if the two vectors are different, a high value is seen. This is called a similarity matrix. It is then possible to observe shares of low values forming along the diagonal line of segments containing similar content. When a proper threshold is applied, it is possible to determine points in time where one segment ends and another segment begins.
Another approach for segmentation may include the use of a novelty measure, which determines, again on the string of vectors, the self-similarity of the vectors in a very short time window. If the vector values start to change dramatically, a segment boundary is introduced. This is more practical for algorithms that need to run under streaming conditions, or in memory constrained environments. However, if the underlying signal changes a lot, for example, as may be common in some forms of Jazz music, it is possible to have numerous very small segments. Both of these techniques, the novelty metric and similarity matrix, can be used in various combinations.
Another technique for segmentation may include calculating a chromagram, which is a spectrogram where the energies in the frequency bands of the song are mapped into one octave, and which describes the harmonic content of a musical signal. This method reflects mostly melodic and chord information of the song, but it is also possible to exploit rhythmic signal elements such as onset density over time, though given a lot of music is rhythmically similar in itself, this may have limited applicability.
Instead of these deterministic approaches, it may be more advantageous to use neural networks to conduct segmentation, where the neural net is fed either the music signal directly, or the output of a frequency transform, or a vector stream of features as described in the deterministic approaches. It may be advantageous for the neural net to be in the form of a recurrent neural net (RNN), as this architecture is good for capturing evolution and/or change over time. The neural network is then trained using manually created examples. With this approach, it is possible to create hundreds of thousands of segments and train the neural network to distinguish boundaries between these segments, or lack of boundaries within each segment.
The segments could be human annotated, or the output of a systemic approach as detailed previously. It is also possible to arbitrarily combine portions from separate songs, and call those boundaries, whereas each song itself in its sequence would be considered one segment. The few segment boundaries within the song would be considered false positives and is likely to be ignored as ‘noisy data’ by the neural network.
Once the existence and location of a segment within a song is determined, there are some post processing steps that can compare segments within a song to each other, and determine whether these are similar segments (e.g., multiple chorus or verse parts) or different. In this process, it is possible to apply a map of beats and bars to determine on which beat a segment begins or ends, to create loop-able material. For instance, beat and bar detection may utilize models where pulse frequency is determined over time in specific frequency bands, such as bass frequencies or high frequencies to determine the pulse of a bass drum or hi-hat. These frequencies determine a window, within which periodic energy increases are determined, ideally in the time domain to preserve temporal accuracy, to find the beat onset location. Meter (bar information, i.e., 3/4, 4/4, 6/8, etc.) is determined using again periodicities in the temporal energy envelop of individual select spectral bands but looking for a ratio of the meter lower than the beats, i.e., the estimated beat per minute (bpm) value. For example, if the algorithm determined a song at 120 bpm, and there is another frequency peak at 30 bpm, it is very likely that the song has a 4/4 meter, whereas a 40 bpm peak would indicate a 3/4 meter. Once meter and tempo of a song is identified, this information can be correlated with energy increases in the audio to create a beat grid, as previously described. It is then possible to take the segment boundaries, and try to align them with the beat grid, in most cases as simple as finding the closest beat or bar position. It is also possible to use trained neural networks to solve tempo detection tasks, which works generally better than other techniques, especially in situations where the music signal is different enough to thwart the signal processing-based approaches. Neural networks are much better at generalization and figuring out the gestalt of music. The downside is, neural networks often need substantial training material which needs to be generated, primarily manually.
Music Intake Process:
With reference again to
In a preferred example, all tags are auto-generated, whereby all auto-generated tags are placed into an “Auto Tagged” database and the model/version used to generate the tag is stored. The same tag can be set by a user, or any number of model/version combos into a “Tag Log” and the system 10 or user set the “Selected Tag” from the set as the final associated with the actual entity. Areas of initial auto-tagging include: auto-structure tags (beats, bpm, bars), auto-segment tags (intro, verse, chorus, etc.), auto-point of interest tags (drops, high dynamic range over short time), auto-subject (based on lyrics within the time-bound of the entity), lyrical segments (time indexes of lyric blocks), and vocal presence flag.
To upload a song or point or link to a streamable song, the song uploader 24 can be embedded in multiple pages and be available to various account types, such as an artist, a team member, etc. The musical work 22 file uploaded may be transmitted from the local computing device to the system's 10 storage database. While various filetypes may be accepted, it may be possible to transcode WAV files. The song upload may be logged in a database of the system 10, where the upload ID (UID) of the upload itself is stored with the file. The uploaded song may then get presented for tagging and/or association. It may be possible to have an “Upload Into” folder structure for the artist, album, or track. Registration or association of the uploaded song may occur in two or more ways. If the song is uploaded into an existing track, that track ID is inherited, and the song is registered as the latest revision of the underlying song file for that track. If the track uploaded is already tagged, an open process to align the new song file with the existing song source being replaced may be used. This is referred to as an alignment process. Within the alignment process, the system opens the original song in the current song in parallel players. The original song is played to an identifiable place “key” and the current song is played to the same audible “key.” A user may then align two songs via a time-offset value saved into the song version. However, if the song is not uploaded into an existing track, the system 10 assigns a new, unique track ID to the song. System 10 then presents a form to the user to fill in details, such as the name of the song, the artist, and the genre of the song, among other possible data.
Next, system tags at the track level. The first tagging occurs based on genre style or other common metadata associated with the song file. Next, first order tagging is done which includes emotion tagging, lyrics, and subject tagging. With emotion tagging, emotion vector and an intensity slope may be used, or the emotion vector is based on the main emotion category plus the energy scalar plus a stress scalar. The second order of tagging, which is optional, tags based on moods, emotions, subjects, or other similar attributes.
Next, the system tags track structure, which may be achieved using a clipping tool. The entire track waveform is presented with a player and a time index selector. The structure tags are then set and/or confirmed. These include beat indexes, bar indexes, beats per minute, and structure shift points if a song has more than one beat structure over its length. Next, the lyrical segments are set and/or confirmed, which identifies lyrical blocks with a start and end time segment. The segment tags are then set and/or confirmed. These include, for example, tagging the various parts of the song, such as the intro, the chorus, the verses, or any other start and stop points of the song. This also includes selecting lyrics within segment and also flagging presence of vocals. The point of interest tags may then be set and/or confirmed. These include 3-7 second segments of interest, often drops or high dynamic ranges over a short period of time. Next segments are tagged. First, the genre style or other common metadata of a song is inherited and optionally reset. Then, the 1st order tagging is parroted and optionally reset, again including emotion tagging lyrics and subject tagging. Finally, second order tagging may optionally be completed to account for moods, emotions, subjects, etc. It is noted that second order tags can look up and set first order tags.
With regards to handling songs that are instrumental only, i.e., without vocals, a song is uploaded to become a track and then it gets tagged, as described above. When an instrumental version of a song is uploaded and there is already an existing non-instrumental version, the system presents an alignment interface to select an alignment key between the original and the instrumental versions. This allows for adjusting the instrumental song to the same time index as the original, therefore making all tags time index compatible between versions of the song. When a song is uploaded that is instrumental, and there is no existing song within the system, the uploaded song is treated as an original.
It is noted that the extraction of emotion or musical quality from a song can be achieved manually, such as through human listeners, or it can be performed autonomously, or a combination thereof. For example, emotion extraction may include multiple algorithms processing segments for emotion and emotion intensity based on characteristics such as beat rate, high dynamic range of beats or noise level, key, and tone, and/or certain musical signatures, as well as on previously machine-learned emotions for a segment which were based on a manual tag processes. Similarly, algorithms and machine learning may also be used for segment normalization and benchmarking, where algorithms that cause tags of different segments, even from different songs, to be identified as “compatible.” This supports selecting a segment, looking up its associated tags, then finding segments with same tags to receive a collection of segments that identify the same or similar emotional, style, and content story. It is possible to match one or more of the tag types to achieve the same emotion but do so without being limited to a particular musical style. It is also noted that algorithms may be used for segment linking compatibility. For example, algorithms can ascertain if different segments are musically compatible to string together end-to-end such that they sound like a cohesive composition. This may use features such as vocals, anacruses detection, intensity scaling, sonic qualities, beat positions, or others. The algorithm may drive auto-stitching of the segments into the composite composition with music without requiring crossfades or similar segues.
Composite Soundtrack by Context and Event:
Since each segment of the song has an identified music quality and emotion, style, or vibe attribute, the system is capable of playing any of the segments at any given time. As such, the system can coordinate playing a segment with a particular event in the video game. Examples of this are shown in
Selection of a Different Pre-Compiled Composite Soundtrack:
As shown in
With the functionality of the system as described relative to
The processor 30 may be in communication with various databases such as a music database 40 and a business database. The databases may contain data stored in the system 10, for example, in the business database, data about artists, customers, fans, syncs, and soundtrack activity, end the music database, master music data, subclips, soundtracks, and releases. The system may also include various input or output devices such as a network connection to communicate user computers, with games directly, or with other applications through API connectors. This connection may allow for the communication of data concerning the system such as reporting the use and quantity of music play, for payment by the game developer or user, for payment to the artist, and/or connecting a user of the system to an artist through a social network or similar platform. A communication module may also be used to communicate to users through email, text, or in-app notifications. The system 10 may also have an interface which allows for interaction with the user within the game or entertainment experience itself, such that the player of the game can interact with the system 10 without leaving the game.
The search tool may allow users to search by various parameters, such as by keyword, or by a filtering tool, among others. For instance, a user can search based on emotion, subject, or style and receive a list of clips back. The clips may be interesting portions of a song such as those being 10 to 15 seconds long which matched the query. In the example of
It is noted that an elastic search synonym plugin may also be used for turn-key emption words to the system's taxonomy mapping, but save the original word requested in a log for later direct-tagging. The user can drop clips into their Projects/Playlist builder, can dial-up and dial-down energy and stress scalers to refine search return, i.e., “Even happier” or “Even more angry”, can select a clip to be final for the section, and can select for a clip to be stretched.
Additionally, the search query tool may allow someone to iteratively build up a store of soundtracks that they're interested in. For instance, someone can have a “My Project” virtual container to drop soundtrack candidates into for buyers of music. There can also be a “My Playlists” virtual container to drop soundtrack candidates into for fans of music, e.g., playlists such as “workout” or “road trip” etc.
The search tool may also be a purchasing support tool. For example, for buyer-type users, they are pre-buying soundtracks for their project, then using the search tool over time to iterate on possible soundtrack “seeds”. Once they have the sound they like, they can finalize their selection, such as by selecting the clip and getting the song as-is in exchange for a pre-purchased credit, or selecting the clip but specify adjustments, such as length, stringing multiple clips together, etc. If the specified adjustments can be automatically made, a smaller “customization price” can be ascribed which requires more credits to be purchased, whereas if the specified adjustments must be made manually by the system, a larger “customization price” can be ascribed with requires more credits to be purchased. When customization is selected, the system's workflow will ensure it is completed, such as by employees of the system that use editor tools, by connecting the artist themselves to custom make a song to fulfill the requested soundtrack, or with another approach. When a soundtrack is finalized, either customized or not, it is released and given a unique ID and bound to the media it is connected to which also gets a unique ID. Additionally, a finalized soundtrack may be attached to the originally intended project, and also cross promoted. This may include conducting a launch campaign for artists and promote the released game, video, or other entertainment experience, or a launch campaign for buyer to promote the artists, among others.
As can be seen, the system 10 is in communication with other systems and databases, for example, a system with original released music sources 102 and a system which connects to dynamic soundtrack clients 112. As can be seen, a user within the dynamic soundtrack client block 112 can be currently listening to a segment of a song. When the segment of the song is played the system 10 may receive a ping at 114. The ping from 114 is received by the system 10, which communicates with the original released music sources 102 to connect the user to the song from which the segment is derived. This may include connecting the user to a third party streaming service, a third party database having the song, or released music registries, among other possible sources. These sources can provide the particular song back to the system 10, where it is ingested or registered, in can be linked to the user specifically. For example, the particular song that the user like a segment from can be communicated to the user such that the user can listen to that song, such as by streaming it or purchasing it.
Additionally, that song can be used to further compile or stream a dynamic soundtrack to that user, whereby the system 10 recognizes that the users interest in that segment can be used to build further soundtracks for that user. The system 10 can also generate is dynamic soundtracks for various activities with which the user is engaged. For example, for a user who is playing a video game and finds a segment of a soundtrack that they like, the system 10 can compile a new soundtrack based on the artist of that segment to be used by the user for exercise, driving, or other activities apart from a video game.
It is noted that the system 10 can utilize Artificial Intelligence (AI), as well as machine learning, neural networks, or other automated computer processing techniques to perform some or all of the functionality described herein. For example, a neural network can be used to determine if a segment is loop-able, or if a segment is cut right for a good transition point. The neural net can be trained to judge what a good transition sounds like, and then it can synthesize a loop or a transition with a set of candidate transition points, whereby the neural net can determine which point the best transition point is from these synthesized transitions. This use of the neural network can avoid jarring transitions between segments that will not make sense to a consumer. Additionally, AI processing can be used in equalization and volume normalization. Once segments are identified for transition, the EQ and volume can be adjusted so the segment is seamless. This can be done in a similar fashion to as described herein relative to neural networks with segment looping. The neural network is trained on what acceptable volume/EQ differences are, and then once all segments for a game are identified, it is possible to create all transitions, and create normalization and EQ info for each segment from that.
Along with the deterministic methods of identifying musical segments, machine learning (ML) can be employed to derive additional methods of identifying segments. This is accomplished by feeding manually edited soundtrack cut-points along with the original contributing song files and transforms thereof for machine learning infer “best cut-points” using the manually re-cut music as a “crowdsourced opinion” of what makes a point in music a sensible cut-point, and what causes two originally separate musical elements to be compatible with one-another. Both ML and neural networks are used to feed in many different “sonic features” extracted from the original song including onset density, onset patterns, volume, presence of certain sonic elements such as frequency ranges, voices, and baselines as further correlation data elements to surround the cut-points “professionally selected” by the human editor of the input training set.
For the extraction of emotion from a segments in an automated way, the system 10 may send segments to a plurality of automated processors and people, including the original artists, fans of the artists, and random people via a Mechanical Turk workflow system and user screens for collecting human input of type emotion, subject, and other tags. When two or more automated and/or human inputs are made, if all votes are the same or similar, that segment is authoritatively tagged and placed into a ML and Neural-Net Training Set for the creation of new AI versions that can infer what sonic features of the segment correlate to the selected tag in the same manner as the segment extractor training process.
End-user selections of segments in the search engine can generate a signal around which segments are popular and interesting and correlate that to search terms indicating intention of user are logged and serve to further train AI models using these logs as crowdsourced opinions of both popularity and connection of that segment to the intention of the user. In-digital the experience, the system 10 will output event data from the digital experience to infer what rules the developer used to connect event to soundtrack state selection to crowdsource the visual and experience (game) states that surround transition points. This training set trains models of AI that can take running game state in the form of visual analysis and direct data log output from the experience (such as health, steps, power meter, etc.) and correlate such experience states to soundtrack cut-points and transitions to affectual the creation of an AI engine that can auto-connect segment switch events to the soundtrack.
While the system is described herein in relation to digital entertainment experiences, namely video games, the system may have applicability in many other fields. For example, the system can be used with personal electronic devices that measure characteristics of the human body through telemetry, such as heart rate. Accordingly, it is possible for the system to be used in conjunction with these electronic devices to affectively provide composite soundtracks to nondigital experiences, such as exercise or athletics.
It should be noted that any process descriptions or blocks in flow charts should be understood as representing modules, segments, portions of code, or steps that include one or more instructions for implementing specific logical functions in the process, and alternate implementations are included within the scope of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present disclosure.
It should be emphasized that the above-described embodiments of the present disclosure, particularly, any “preferred” embodiments, are merely possible examples of implementations, merely set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made to the above-described embodiment(s) of the disclosure without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and the present disclosure and protected by the following claims.
Number | Name | Date | Kind |
---|---|---|---|
7637424 | Silverbrook | Dec 2009 | B2 |
7730216 | Issa | Jun 2010 | B1 |
8819030 | Freed et al. | Aug 2014 | B1 |
8928727 | Milligan | Jan 2015 | B1 |
9021088 | Bilinski et al. | Apr 2015 | B2 |
9992316 | Hardi | Jun 2018 | B2 |
10037780 | Roberts et al. | Jul 2018 | B1 |
10048931 | Vartakavi et al. | Aug 2018 | B2 |
10178365 | Singh et al. | Jan 2019 | B1 |
10333876 | Guthery et al. | Jun 2019 | B2 |
10474422 | Venti et al. | Nov 2019 | B1 |
10692537 | Eppolito | Jun 2020 | B2 |
10891103 | Venti et al. | Jan 2021 | B1 |
10956945 | Lewis | Mar 2021 | B1 |
11169770 | Venti et al. | Nov 2021 | B1 |
11449306 | Venti et al. | Sep 2022 | B1 |
11481434 | Venti et al. | Oct 2022 | B1 |
20050147256 | Peters | Jul 2005 | A1 |
20080270138 | Knight | Oct 2008 | A1 |
20090013263 | Fortnow | Jan 2009 | A1 |
20090128335 | Leung | May 2009 | A1 |
20090177303 | Logan | Jul 2009 | A1 |
20090187624 | Brownholtz | Jul 2009 | A1 |
20090300670 | Barish | Dec 2009 | A1 |
20100023578 | Brant | Jan 2010 | A1 |
20100031299 | Harrang | Feb 2010 | A1 |
20100164956 | Hyndman | Jul 2010 | A1 |
20100241711 | Ansari | Sep 2010 | A1 |
20110161409 | Nair | Jun 2011 | A1 |
20110225417 | Maharajh | Sep 2011 | A1 |
20110261149 | Anuar | Oct 2011 | A1 |
20120069131 | Abelow | Mar 2012 | A1 |
20120144343 | Tseng | Jun 2012 | A1 |
20120144979 | Tansley | Jun 2012 | A1 |
20120278387 | Garcia | Nov 2012 | A1 |
20120304087 | Walkin | Nov 2012 | A1 |
20130066964 | Odio | Mar 2013 | A1 |
20130173742 | Thomas | Jul 2013 | A1 |
20130297686 | Bilinski | Nov 2013 | A1 |
20140123041 | Morse | May 2014 | A1 |
20140137144 | Jarvenpaa | May 2014 | A1 |
20140181110 | Imbruce et al. | Jun 2014 | A1 |
20140214848 | Devkar et al. | Jul 2014 | A1 |
20140280498 | Frankel | Sep 2014 | A1 |
20140289330 | Liu | Sep 2014 | A1 |
20140301573 | Kiely | Oct 2014 | A1 |
20140337761 | Glass | Nov 2014 | A1 |
20150089397 | Gorod et al. | Mar 2015 | A1 |
20150169747 | Hume | Jun 2015 | A1 |
20150193516 | Harb | Jul 2015 | A1 |
20150195620 | Buchner | Jul 2015 | A1 |
20150215597 | Xu | Jul 2015 | A1 |
20150242525 | Perlegos | Aug 2015 | A1 |
20150318020 | Pribula | Nov 2015 | A1 |
20150339300 | Stoddard, II et al. | Nov 2015 | A1 |
20150373065 | Holmquist | Dec 2015 | A1 |
20160149956 | Birnbaum | May 2016 | A1 |
20160173683 | Abreu | Jun 2016 | A1 |
20160196105 | Vartakavi et al. | Jul 2016 | A1 |
20160217328 | Yanai | Jul 2016 | A1 |
20160224311 | Touch | Aug 2016 | A1 |
20160227115 | Bin Mahfooz et al. | Aug 2016 | A1 |
20160246452 | Bockhold | Aug 2016 | A1 |
20160248840 | Bockhold | Aug 2016 | A1 |
20160300594 | Allen | Oct 2016 | A1 |
20170024399 | Boyle | Jan 2017 | A1 |
20170154615 | Serletic et al. | Jun 2017 | A1 |
20170289202 | Krasadakis | Oct 2017 | A1 |
20170372525 | Rosenthal et al. | Dec 2017 | A1 |
20180053510 | Kofman | Feb 2018 | A1 |
20180103292 | Zheng | Apr 2018 | A1 |
20180152736 | Alexander | May 2018 | A1 |
20190045252 | Lyons et al. | Feb 2019 | A1 |
20190051272 | Lewis | Feb 2019 | A1 |
20190163830 | DeLuca | May 2019 | A1 |
20190197589 | Singh et al. | Jun 2019 | A1 |
20190286720 | Agrawal et al. | Sep 2019 | A1 |
20220319478 | Lyske | Oct 2022 | A1 |
Number | Date | Country |
---|---|---|
107076631 | Aug 2017 | CN |
WO2013166140 | Nov 2013 | WO |
WO2014100893 | Jul 2014 | WO |
Entry |
---|
U.S. Appl. No. 17/138,695, filed Dec. 30, 2020, Venti et al. |
U.S. Appl. No. 16/699,330, filed Nov. 29, 2019, Venti et al. |
Notice of Allowance issued in U.S. Appl. No. 15/490,800, dated Aug. 29, 2019 (9 pgs). |
Office Action issued in U.S. Appl. No. 15/490,800, dated Jun. 27, 2019 (18 pgs). |
Office Action issued in U.S. Appl. No. 15/490,800, dated Sep. 7, 2018 (18 pgs). |
Wu et al., Bridging Music and Image via Cross-Modal Ranking Analysis; IEEE, Jul. 2016; 14 pgs. |
Notice of Allowance issued in U.S. Appl. No. 16/592,549, dated Sep. 9, 2020 (10 pgs). |
Office Action issued in U.S. Appl. No. 16/592,549, dated Aug. 21, 2020 (9 pgs). |
Office Action issued in U.S. Appl. No. 16/592,549, dated May 29, 2020 (35 pgs). |
Maestre et al., Enriched Multimodal Representations of Music Performances: Online Access and Visualization; IEEE; 2017; 11 pages. |
Segmentino, https://code.soundsoftware.ac.uk/projects/segmenter-vamp-plugin, accessed Jul. 8, 2021, 3 pages. |
Ren et al., “Automatic Music Mood Classification Based on Timbre and Modulation Features” IEEE, pp. 236-246, Apr. 29, 2015, abstract, 3 pages. |
Zhao, “Explore Music World: Categorize Music by Mood”, CCTP 607 Spring 2020, Apr. 26, 2020, https://blogs.commons.georgetown.edu/cctp-607-spring2020/2020/04/26/explore-music-world-categorize-music-by-mood 10 pages. |
Notice of Allowance issued in U.S. Appl. No. 17/138,695, dated Jul. 9, 2021 (18 pgs). |
Oyang et al., Characterizing the Service Usage of Online Video Sharing System: Uploading vs. Playback; IEEE; 2016, 7 pages. |
Xing et al., Proximiter: Enabling Mobile Proximity-Based Content Sharing on Portable Devices; 2009 IEEE, 3 pages. |
Belimpasakis et al., Home Media Atomizer: Remote Sharing of Home Content—without Semi-trusted Proxies; 2008 IEEE, 9 pages. |
Pering et al., Face-to-Face Media Sharing Using Wireless Mobile Device; 2005, IEEE 8 pages. |
Notice of Allowance issued in U.S. Appl. No. 17/520,579, dated May 11, 2022 (19 pgs). |
Lee et al., Face-to-Face Media Sharing Using Wireless Mobile Devices; 2013, IEEE, 10 pages. |
Office Action issued in U.S. Appl. No. 16/699,330, dated Feb. 17, 2022 (13 pgs). |
Notice of Allowance issued in U.S. Appl. No. 16/699,330, dated Jun. 17, 2022 (33 pgs). |
Notice of Allowance issued in U.S. Appl. No. 17/947,799, dated May 26, 2023 (11 pgs). |
Hopmann et al., Virtual Shelf: Sharing music between people and devices, 2010, IEEE, 7 pages. |
Office Action issued in U.S. Appl. No. 18/049,255, dated Aug. 18, 2023 (16 pgs). |
U.S. Appl. No. 15/490,800, filed Apr. 18, 2017. |
U.S. Appl. No. 16/592,549, filed Oct. 3, 2019. |
U.S. Appl. No. 17/138,695, filed Dec. 30, 2020. |
Office Action issued in U.S. Appl. No. 17/947,799, dated Feb. 2, 2023 (12 pgs). |