This invention relates generally to the field of media production. More specifically, the invention discloses new and useful methods and systems for generating mutated media fragments for media production.
Since the advent of software-based programs that allowed musicians and producers (m/p) to record and playback notes previously recorded, sequencers, particularly digital audio workstations (DAWs), has been a mainstay in contemporary music production. Aside from offering robust editing and recording tools, DAW’s also allowed the m/p to capture MIDI data and utilize sample libraries. Aside from the variety of programmable controls, including generative looping of evolving patterns of notes, the m/p may also exploit the technology to connect with virtual instruments implemented as software plug-ins; long gone are the days of each synthesizer needing a dedicated keyboard—or even having to lug around the physical instrument in the first place. Despite its widespread adoption, it is largely being underused—or worse, misused—due to its complexity of use. The long learning curve and the prohibitive “time-cost” serve as a barrier for an entry-level m/p to extract the full potential of currently offered DAW’s.
While attempts have been made for a more intuitive interface to better match m/p workflows-regardless of experience-they have done little to relive the “time-cost” problem. The industry has dealt with this problem with product segmentation: a dedicated tier for each a beginner, experienced, and expert m/p. The shortcoming of this approach is that the tiers are no different in terms of user interface icons, graphics, prompts, controls, etc. Rather, they simply offer a smaller tool bag to a beginner. Needless to say, while this approach may offer a certain “ease of use”, unfortunately, it also offers a “limit on use”, since the entire suite of tools is not available.
To that end, there is a void in the market and art for a DAW-style interface that outputs an archive of ‘set-it and forget-it procedurally generated audio fragment/s for downstream audio integration. Procedurally generated audio fragment/s that are mutated from an original input based on user-trained inputs. A pipeline that delivers ease of use while not compromising the suite of offerings and each tool’s capability. Ease-of-use including for graphically interactive ways for a user to self-tune during mutation. For that matter, there is likewise a void in the art to allow a user to intuitively interface with a neural network-trained pipeline for procedurally generating any media fragment for any media integration. Solutions are sorely needed to address the twin issues of non-intuitiveness and a limited toolkit, both of which are bottlenecking creative endeavors across several fields—the most salient of which is audio/music production due to the non-visual nature of the output.
In one generalized aspect, disclosed herein are methods and systems for mutating a media file output, comprising the steps of: a. receiving a user input, wherein the user input is at least one of a media file and/or a response to a survey from a user; b. A method for mutating an audio file, said method comprising the steps of receiving a user input, wherein the user input is at least one of an audio file and/or a response to a composition survey from a first user; entering at least a pattern into a grid sequencer by selecting any number of squares in the grid, wherein each square represents a particular count occupancy at a particular count in a musical composition bar that the user prefers to render as a final output; uploading at least one ‘good’ and ‘bad’ audio file sample by the user to affect the particular count occupancy based on the user input and pattern; and rendering the final output comprising the mutated audio file and a visualization of the grid sequencer in terms of an indicator of a probability of a particular count occupancy based on the user input, pattern, and upload. Forms of media may include any media with any audio and/or video playback in real-time or not. One example may be an audio pipeline-coupled to an audio input source, neural network (more particularly, weighted-averaged), and a composition engine-that procedurally generates mutated fragment/s (segment/s) of the audio file based on a current and/or historically tracked user submission (input x...n, etc.). The visually indicative grid sequencer with beat count occupancy probabilities in integer form (optionally, color-coded), along with the user-training features provides a higher resolution of user specificity, with tremendous ease of use in fine-tuning a mutation in a procedurally generated lineage of derived fragments by the user’s preference.
Each fragment may optionally be archived for color-spectrum/rated display for visual/fast retrieval of a fragment, or to reorient the currently processed fragment by mining for a more preferred region in the ‘harvest-graph’. The mining may be performed quicker (with far greater ease of use) with the guidance of the color spectrum feature of the ‘harvest-graph’, allowing users to make a ‘quick-capture’ comparative based on the likeness of color to a reference harvest/fragment/segment/file. Furthermore, each archived fragment may additionally be saved, searched, and shared-in any one of a file form, indexed tags, user input, grid sequencer input pattern, grid sequencer evolving pattern with count occupancy probabilities, for a second user to ‘germinate’ the ‘seed’ based on their ‘mutation-tuning’ preferences.
The lynchpin of this hyper-specific ‘mutation tuning’, is the option for users to submit training samples, designated as ‘good’ or ‘bad’ audio samples by the user based on his or her personal preference. In one generalized aspect, disclosed herein are also systems and methods for mutating a media harvest, comprising the steps of: a) receiving user input, wherein the user input is at least one of an audio (media) file and/or a response to a survey from a user; and b) generating a mutated audio fragment based on the user input, and ‘good′/‘bad’ sample audio files submitted by the user. Furthermore, the pipeline uses further submissions/inputs from the user and the composition engine to mutate and guide these germinated harvest results in favorable ways, where ‘good’ submissions increase the characteristics’ associated with the corresponding algorithm variables and the opposite effect occurs for ‘bad’ submissions. Furthermore, while some aspects may not require pattern entering into the grid sequencer by the user as a starting point in the pipeline, processing/procedural generation may output a visually indicative grid sequencer with beat count occupancy probabilities in integer form (optionally, color-coded) as well, to visually allow the user to adjust-on-the-fly parameters/training uploads to adjust the count occupancy probabilities in near-time-and in turn, output a more user-tuned ‘mutated fragment’. These and other features and improvements of the present application will become apparent to one of ordinary skill in the art upon review of the following detailed description when taken in conjunction with the several drawings and the appended claims.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings which:
Numerous embodiments of the invention will now be described in detail with reference to the accompanying figures. The following description of the embodiments of the invention is not intended to limit the invention to these embodiments but rather to enable a person skilled in the art to make and use this invention. Variations, configurations, implementations, and applications described herein are optional and not exclusive to the variations, configurations, implementations, and applications they describe. The invention described herein can include any permutations of these variations, configurations, implementations, and applications.
In the following description, numerous specific details are outlined in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention can be practiced without these specific details.
Reference in this specification to “one embodiment” or “an embodiment” or “some embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiment(s) is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” or “in some embodiments” in various places in the specification are not necessarily all referring to the same embodiment(s), nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but no other embodiments.
As a person skilled in the art will recognize from the previous detailed description and the figures and claims, modifications and changes can be made to the embodiments of the invention without departing from the scope of this invention as disclosed herein the present application. It will be appreciated that, although the methods, processes, and functions of the present application have been recited in a particular series of steps, the individual steps of the methods, processes, and functions may be performed in any order, in any combination, or individually.
Embodiments are described at least in part herein regarding flowchart illustrations and/or block diagrams of methods, systems, and computer program products and data structures according to embodiments of the disclosure. It will be understood that each block of the illustrations, and combinations of blocks, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block or blocks.
The aforementioned computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function/act specified in the block or blocks. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus, to produce a computer-implemented process such that, the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the block or blocks.
In general, the word “module” as used herein, refers to logic embodied in hardware or firmware, or to a collection of software instructions, written in a programming language, such as Java, C, etc. One or more software instructions in the unit may be embedded in firmware. The modules described herein may be implemented as either software and/or hardware modules and may be stored in any type of non-transitory computer-readable medium or other non-transitory storage elements. Some non-limiting examples of non-transitory computer-readable media include CDs, DVDs, BLU-RAY, flash memory, and hard disk drives.
Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context dictates otherwise. Any reference to “or” herein is intended to encompass “and/or” unless otherwise stated.
As used herein, an “audio-visual file” or “AV file” is a series of one or more audio-visual (AV) clips recorded on the same video source (e.g., a single video camera). Two or more “parallel AV files” are recordings of the same action recorded on two or more respective video sources.
Now in reference to
In one embodiment, a system may comprise: a rendering module 107207; a visualization module 105205; a processor; a memory element coupled to the processor; a program executable by the processor, over a network 103, to render a mutated audio output comprising the mutated audio file and a visualization of the grid sequencer in terms of an indicator of a probability of a particular count occupancy based on the user input, pattern, and upload (PILE) 101102. The interactive visual enables a user to tune/train the construct for hyper-specific randomized mutations/germinations derived from the PILE 101102. As shown in
The network 103 may be any suitable wired network, wireless network, a combination of these, or any other conventional network, without limiting the scope of the present invention. A few examples may include a LAN or wireless LAN connection, an Internet connection, a point-to-point connection, or other network connections and combinations thereof. The network 103 may be any other type of network that is capable of transmitting or receiving data to/from host computers, personal devices, telephones, video/image capturing devices, video/image servers, or any other electronic devices. Further, the network 103 is capable of transmitting/sending data between the mentioned devices. Additionally, the network 103 may be a local, regional, or global communication network, for example, an enterprise telecommunication network, the Internet, a global mobile communication network, or any combination of similar networks. The network 103 may be a combination of an enterprise network (or the Internet) and a cellular network, in which case, suitable systems and methods are employed to seamlessly communicate between the two networks. In such cases, a mobile switching gateway may be utilized to communicate with a computer network gateway to pass data between the two networks. The network 103 may include any software, hardware, or computer applications that can provide a medium to exchange signals or data in any of the formats known in the art, related art, or developed later.
Preferred embodiments may include the addition of a remote server or cloud server 508 to further provide for back-end functionality and provisioning/analytical support 510. The server 508 may be situated adjacent to or remotely from the system and connected to each system via a communication network 103. In one embodiment, the server 508 may be used to support user behavior profiling; user history function; predictive learning/analytics; alert function; network sharing function; digital footprint tracking; visualization, graphical interactivity, etc. (510).
The electronic computing device may be any electronic device capable of sending, receiving, and processing information. Examples of the computing device include, but are not limited to, a smartphone, a mobile device/phone, a Personal Digital Assistant (PDA), a computer, a workstation, a notebook, a mainframe computer, a laptop, a tablet, a smartwatch, an internet appliance and any equivalent device capable of processing, sending and receiving data. The electronic computing device can include any number of sensors or components configured to intake or gather data from a user of the electronic computing device including, but not limited to, a camera, a heart rate monitor, a temperature sensor, an accelerometer, a microphone, and a gyroscope, to assess a state of the user for informing the user profile/context for more user-specific randomized mutation/germination. The electronic computing device can also include an input device (e.g., a touchscreen, keyboard, or mouse) through which a user may touch and/or cursor control for input commands. Multiple inputs from a single user computing device (as shown in
In another embodiment of the present invention, the rendering/mutation algorithm may employ unsupervised machine learning to learn the features of drum count occupancy probability from the PILE and iterative (i) inputs (any input beyond PILE) for final rendering. For example, a Neural Network Autoencoder can be used to learn the features and then train a Deep Neural Network or a Convolutional Neural Network. The classification may be based on a supervised or unsupervised machine learning technique, and the classification is performed by analyzing one or more features of the inputs (PILE/i). Such approaches result in hyper user specificity, in what may otherwise appear as a randomized mutation, and not to mention a reduction of power consumption and/or increase in the detection speed and accuracy.
Additionally, in another embodiment of the invention, the system may comprise a back-propagated neural network to use a series of externally captured buffers containing known audio-visual sources to aid in real-time recognition of the audio and video input by using a probabilistic approach to determine the presence in a captured buffer. A classification algorithm may be based on supervised machine learning techniques such as SVM, Decision Tree, Neural Net, Ada Boost, and the like. Further, the classification may be performed by analyzing one or more features based on any one of, or combination of, any PILE/i.
While not shown in
Now in reference to
In continuing reference to
Below the Confidence drop-down, lies the Seed integer drop-down, specifying an identifier for the random instance the output file was rendered. Different seeds render different composition results in the output audio file within the parameters specified in the confidence score value box. The seed may be seen as a distinct procedurally generated fragment derived from at least one of the user patterns, input, load, or entered. The seed is at least one saved, played- back, uploaded for training, or shared to another user for seed germination based on the other user’s preferences, or scraped to determine the first users’ seedling characteristics (pattern, input, load, or entered). Following the user inputs of a confidence and seed value, the user may then enter in a Reps value by drop-down or manually entering a text/numeric value that indicates how many times the neural network will be trained. By increasing the value of the rep, sequencer composition outliers will be further controlled for. By controlling for the value of the rep, the user has an additional incremental mutation tuning tool-allowing for the user to engage in an ever-so-slight germination trajectory yet again. The user manipulation makes adjustments as a more fine-tuned technique of filtering results that are more or less sporadic, similar to a limiter audio effect where a vocalist’s volume level is more consistently spaced from the microphone. The confidence parameters establish a ceiling with its curvature tightened or loosened by the Reps integer. The default value is 100,000. What’s more, the seeds may be visually depicted in a graph, based on a pre-defined color-coded analogy of a sound/sound feature/sound characteristic for further mutation tuning, processing, sharing, etc.
The user then enters in a Harvest value as an integer, indicating how long the output audio file will be expressed in bars. The value determines the size of the mutated audio file and/or final rendered output. The textbox’s default integer is 4 bars. Each bar will contain a unique mutation making it easier for the user to systematically review them one after another.
The user then presses the Upload button under the Good/1 Batch Size title and an integer indicating the batch collection size the user wishes to reflect in the final rendered audio output file. The user selects the “Upload” button to summon the device’s native OS, the file explorer window, where they can select a text composition file on their device’s hard drive. The user presses the Load Notation button to increment the Good/1 Batch Size integer by one indicating that the user has stored another composition in the good batch group before training the neural network. The user presses the Train Good/1 Batch to train the neural network where the weights between the perceptron beat count and output neuron are represented as 1 or 0, depending on whether loaded as a ‘good’ or ‘bad’ training sample. This is the leading factor in significantly increasing beat count occupancy for the associated X counts in the text composition file. The Training Dataset integer increases by the ‘good’ compositions batch quantity. Conversely, the ‘batch’ composition files negatively affect the weights instead since they belong to the Bad/0 Batch group.
The Beat Chamber (BC) is the composition that is dialed into the sequencer window or dialed automatically by uploading an individual composition file to toggle the beat count squares from light to dark gray (state 1 to state 2). This composition is sent through a channel that bypasses the mutation process from the neural network (
Exemplary Script Excerpt:
Embodiments are described at least in part herein with reference to flowchart illustrations and/or block diagrams of methods, systems, and computer program products and data structures according to embodiments of the disclosure. It will be understood that each block of the illustrations, and combinations of blocks, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block or blocks.