The present disclosure relates to automatically preparing a Musical Instrument Digital Interface (MIDI) file.
A piano roll, e.g. of a MIDI file, contains notes, each of which is defined by:
A rhythm is obtained by ignoring the pitch information.
It is an objective of the present invention to provide a new MIDI file based on a source MIDI file and a target MIDI file. In accordance with some embodiments of the present invention, the new MIDI file may be regarded as a re-harmonisation of the target MIDI file, using pitches based on the source MIDI file.
According to an aspect of the present invention, there is provided a method of automatically preparing a MIDI file based on a target MIDI file comprising respective note information about each of a plurality of target notes of the target MIDI file and a source MIDI file comprising respective note information about each of a plurality of source notes of the source MIDI file. Each note information, of both target and source notes, comprises pitch information defining a pitch of the note. The method comprises ranking the plurality of target notes based on the pitch of each target note. The method also comprises, for each of the ranked target notes, removing the pitch information from the note information of said ranked target note. The method also comprises, for each of the ranked target notes, replacing the removed pitch information with pitch information of a corresponding source note, whereby said target note has the same pitch as the corresponding source note (since the pitch information is now the same as for the corresponding source note), forming a plurality of new notes of a new MIDI file. Thus, each new note has a pitch of a corresponding source note.
According to another aspect of the present invention, there is provided a computer program product (e.g. a non-transitory computer readable storage medium) comprising computer-executable components for causing an electronic device to perform an embodiment of the method of the present disclosure when the computer-executable components are run on processing circuitry comprised in the electronic device.
According to another aspect of the present invention, there is provided an electronic device configured for performing an embodiment of the method of the present disclosure. Thus, the electronic device is configured for automatically preparing a MIDI file based on a target MIDI file comprising respective note information about each of a plurality of target notes of the target MIDI file and a source MIDI file comprising respective note information about each of a plurality of source notes of the source MIDI file. Each note information comprises pitch information defining a pitch of the note. The electronic device comprises processing circuitry, and data storage storing instructions executable by said processing circuitry whereby said electronic device is operative to rank the plurality of target notes based on the pitch of each target note; for each of the ranked target notes, remove the pitch information from the note information of the target note; and for each of the ranked target notes, replace the removed pitch information with pitch information of a corresponding source note, whereby the target note has the same pitch as the corresponding source note, forming a plurality of new notes of a new MIDI file.
By exchanging the pitch information of the target notes with pitch information of the source notes, the rhythm of the target MIDI file may be maintained while being re-harmonized with the source notes. Thus, a new MIDI file is automatically provided based on the source and target MIDI files. The new MIDI file may be outputted and played.
Embodiments of the method of the present disclosure may be regarded as a type of style or rhythm transfer. Style transfer has previously been proposed for images, e.g. “A Neural Algorithm for Artistic Style”, Gatys et al., using convolutional networks. Style Transfer has also been applied to symbolic music, using Generative Adversarial Networks (GANs), e.g. “Symbolic Music Genre Transfer with CycleGAN”, Brunner et al. However, the present invention is more specific in that harmony (pitches) and rhythm are transferred to a new note sequence of a new MIDI file. In practice, the results may be more musical (i.e. no wrong notes may be provided). Also, in some embodiments of the present invention, the invention works on single source and target MIDI files (no need for training on large datasets), and the result may be more predictable e.g. by a user. Also, parameters are natural, and may allow users to experiment with many meaningful combinations.
It is to be noted that any feature of any of the aspects may be applied to any other aspect, wherever appropriate. Likewise, any advantage of any of the aspects may apply to any of the other aspects. Other objectives, features and advantages of the enclosed embodiments will be apparent from the following detailed disclosure, from the attached dependent claims as well as from the drawings.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to “a/an/the element, apparatus, component, means, step, etc.” are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated. The use of “first”, “second” etc. for different features/components of the present disclosure are only intended to distinguish the features/components from other similar features/components and not to impart any order or hierarchy to the features/components.
The embodiments disclosed herein are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings. Like reference numerals refer to corresponding parts throughout the drawings and specification.
Embodiments will now be described more fully hereinafter with reference to the accompanying drawings, in which certain embodiments are shown. However, other embodiments in many different forms are possible within the scope of the present disclosure. Rather, the following embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. Like numbers refer to like elements throughout the description.
It is noted that when it is herein referred to MIDI files, it is often the audio (e.g. sequence of notes) encoded by the MIDI file which is intended. The length of a MIDI file, or a segment thereof, may thus be regarded as e.g. the number of bars or beats of the audio encoded thereby, or a time duration of the audio when played at a predetermined tempo.
The table of
In accordance with embodiments of the present invention, the new MIDI file N has the same (preferably exactly the same) rhythm r as the target MIDI file T. This implies that the sequence of notes n in the new MIDI file N may be the same as in the target MIDI file, and that the notes retain the same properties as in the target MIDI file T, e.g. onset o, duration d and velocity v, except for the pitch p. Optionally, additional property(ies), e.g. timbre, may be included in the rhythm r which is maintained between the new and target MIDI files. However, optionally, there may also be other property(ies) of the notes n, other than pitch p, which are not included in the maintained rhythm r.
In accordance with embodiments of the present invention, the pitches p of the new MIDI file N are based on the pitches of the source MIDI file S, they are preferably the same as the pitches of the notes of the source MIDI file, but typically not in the same order as in the note sequence of the source MIDI file. Thus, embodiments of the present invention may be regarded as including pitch substitution in the target MIDI file T by pitches of the source MIDI file S. The substitution may be done by mapping, which preferably finds a reasonable trade-off between the pitch distribution of the source and target MIDI files, which may be completely different, and the respective ranking (e.g. high to low or low to high) of the pitches of the source and target MIDI files, e.g. such that low pitches of the target MIDI file are substituted by low pitches of the source MIDI file and high pitches of the target MIDI file are substituted by high pitches of the source MIDI file. More generally, by means of embodiments of the present invention, harmonic (pitch) and rhythmic information from any two MIDI files (called source and target MIDI files herein) may be mixed to produce a new MIDI file N.
Different automated approaches may be used for achieving the pitch substitution. One approach, herein called the naïve method, may (with reference to
In the example of
Then, when the new notes nN are reordered in the same order as the original sequence of the target notes nT in the target MIDI file T, to form a sequence of new notes nN:1-nN:8, the properties of the new notes are as presented in the table of
Additionally or alternatively, an approach using another algorithm, e.g. utilizing machine learning, may be used. With such an algorithm, the replacing of the removed pitch information with pitch information from source notes may comprise determining a probability distribution of the plurality of source notes based on the pitch of each of the source notes, and determining for each of the sorted target notes nT, its corresponding source note nS based on the determined probability distribution, wherein the determining of the probability distribution may be by means of a pre-trained model, e.g. comprising machine-learning such as neural networks.
In an example of a machine learning approach, the method may comprise the following steps:
The pre-trained model may e.g. be trained in the following way:
Generally, pitch (harmonic) information Ip from the source MIDI file S is mixed with rhythm information Ir from the target MIDI file T to automatically prepare the new MIDI file N.
In case the number of target notes is not the same as the number of source notes, notes can be added or removed from either the plurality of target notes or the plurality of source notes, such that the number of target notes is the same as the number of source notes. Removal of note(s) may be done randomly, or in any suitable non-random way. Added note(s) may e.g. be octave note(s) or any other note(s) e.g. which are more suitable for preserving the harmony of the source MIDI file. Generally, the replacing of the removed pitch information comprises: if the plurality of source notes nS contains a higher number of notes than the plurality of target notes nr, removing, e.g. randomly, at least one source note from the plurality of source notes or adding at least one note, e.g. octave note, to the plurality of target notes such that the plurality of source notes contains the same number of notes as the plurality of target notes; or, if the plurality of source notes nS contains a lower number of notes than the plurality of target notes nr, removing, e.g. randomly, at least one target note from the plurality of target notes or adding at least one note, e.g. octave note, to the plurality of source notes such that the plurality of source notes contains the same number of notes as the plurality of target notes.
In a more specific example, a pitch range, e.g. [m−8, M+8], is calculated, where m is the lowest pitch occurring among both the plurality of source and the plurality of target notes, respectively, and M is the maximum pitch occurring among both the plurality of source and the plurality of target notes, respectively. Then, a pitch p is determined for the plurality of source notes for which q=p+12 or q=p−12 such that m−8≤q≤M+8. If such a pitch p is found, q is added to the source pitches (e.g. of the source list LS). If more pitches need to be added, the algorithm can be repeated. If no such pitch p is found, a random pitch may instead be removed from the target pitches (e.g. of the target list LT), thus simplifying the rhythm r in case when the plurality of source notes contains fewer notes, and thus source pitches, than the plurality of target notes.
In some embodiments of the present invention, the plurality of source notes are the notes of a segment of the source MIDI file S, and the plurality of target notes are the notes of a segment of the target MIDI file T, from which segments a segment of the new MIDI file N is formed. Embodiments of the method of the present disclosure may then be performed for any pair of one source segment and one target segment, e.g. till all source notes and all target notes of the source and target MIDI files have been processed in accordance with the method (i.e. have been included at least once in the pluralities of target and source notes discussed herein). For example, the method may be applied to each successive segment of the source MIDI file in combination with respective each successive segment of the target MIDI file, such that e.g. segment i of the source MIDI file is combined with segment i of the target MIDI file, e.g. regardless of the number of target and source segments. If the number of notes per segment is different in any pair, notes may be added or removed as discussed herein.
In case the number of source segments is not the same as the number of target segments, the mapping of segments to each other may be stretched so that all of both source and target segments are used at least once. This ensures that all notes (i.e. the note information In thereof) in each file are processed with an embodiment of the method of the present disclosure. For instance, the shorter sequence of the notes (formed by the plurality of source or target notes) may be looped to form as many segments as the longer sequence.
A MIDI file (i.e. the sequence of notes n encoded thereby) may be segmented into only one segment (the whole file is then considered), or with regular segments of e.g. one beat, two beats, one bar, etc. The file can also be segmented with irregular segments.
A different segmentation can be used for each of the source and target MIDI files. For instance a source MIDI file in 3/4 can be segmented every three beats (1 bar), and if the target MIDI file is in 4/4 it can be segmented every four beats (also 1 bar). This may allow to use a rhythm/harmony in 4/4 and apply it to a 3/4 target.
Arbitrary combinations of segmenting schemes can be used, creating different results. A default segmenting scheme can be set (e.g. each two beats for both the source and the target MIDI files), but any other segmenting scheme may alternatively be used, e.g. by a musician who is experimenting.
When the method is applied to segments, then the successive results, i.e. the resulting sequence of new segments of the new MIDI file N, typically have to be concatenated to each other to produce a single new MIDI file.
The method comprises ranking M1 the plurality of target notes nT based on the pitch p of each target note. In some embodiments, the ranking M1 comprises sorting M11 the plurality of target notes nT based on the pitch p of each of the target notes to form a target list LT.
The method also comprises, for each of the ranked M1 target notes nT, removing M2 the pitch information Ip from the note information In of the target note. However, the rhythm information Ir of the target note nT typically remains part of the note information In of said target note.
The method also comprises, for each of the ranked M1 target notes nT, replacing M3 the removed M2 pitch information with pitch information Ip of a corresponding source note nS, whereby the target gets the same pitch p as the corresponding source note, forming a plurality of new notes nN of a new MIDI file N. Thus, the note information In of each of the new notes nN of the note sequence of the new MIDI file N typically comprises rhythm information Ir from a target note nT and pitch information Ip from a corresponding source note nS.
In some embodiments, the replacing M3 comprises sorting M12 the plurality of source notes nS based on the pitch p of each of the source notes to form a source list LS, and for each of the sorted M11 target notes nT, determining M13 its corresponding source note nS as the source note having the same rank in the source list as the target note has in the target list. Thus, the source note which has the same rank in the source list LS, e.g. any of the ranks 1st to 8th of
In some embodiments, the replacing M3 comprises determining M21 a probability distribution of the plurality of source notes based on the pitch p of each of the source notes, and for each of the sorted target notes nT, determining M22 its corresponding source note nS based on the determined M21 probability distribution. In some embodiments, the determining M21 of the probability distribution is done by means of a pre-trained model, e.g. comprising machine-learning such as neural networks.
In some embodiments, typically independent on how the corresponding source notes are determined, the replacing M3 comprises: if the plurality of source notes nS contains a higher number of notes than the plurality of target notes nT, removing, e.g. randomly, at least one source note from the plurality of source notes or adding at least one note, e.g. octave note, to the plurality of target notes such that the plurality of source notes contains the same number of notes as the plurality of target notes; or if the plurality of source notes nS contains a lower number of notes than the plurality of target notes nT, removing, e.g. randomly, at least one target note from the plurality of target notes or adding at least one note, e.g. octave note, to the plurality of source notes such that the plurality of source notes contains the same number of notes as the plurality of target notes.
In some embodiments, the method 300 comprises block 304 which ranks the plurality of target notes based on the pitch of each target note. For example, the target notes may be ranked in ascending order from lowest pitch of each target note, to the highest pitch of each target note. Alternatively, the target notes may be ranked in descending order from highest pitch of each target note, to the lowest pitch of each target note. Alternatively, the pitch of each target note may be ranked according to a pre-selected or desired frequency, wherein the target notes are ranked according to which target notes have the closest frequency to the pre-selected or desired frequency.
In some embodiments, the method 300 further includes block 306 wherein the ranking comprising sorting the plurality of target notes based on the pitch of each target notes to form a target list. The list may be in ascending order, descending order, or in an order of proximity to the pre-selected or desired frequency.
In some embodiments, the method 300 further includes block 308 wherein for each of the ranked target notes, the pitch information is removed from the note information of the target note.
In some embodiments, the method further includes block 310 wherein for each of the ranked target notes, the removed pitch information is replaced with pitch information of a corresponding source note, wherein the target note has the same pitch as the corresponding source note, forming a plurality of new notes of a new MIDI file. In some embodiments, the corresponding source note has pitch information that is ranked. In some embodiments the pitch information of the source note is ranked in ascending order from lowest pitch of each source note, to the highest pitch of each source note. Alternatively, the source notes may be ranked in descending order from highest pitch of each source note, to the lowest pitch of each source note. Alternatively, the pitch of each source note may be ranked according to a pre-selected or desired frequency, wherein the source notes are ranked according to which source notes have the closest frequency to the pre-selected or desired frequency.
In some embodiments, block 320 is included wherein for each of the sorted target notes, the corresponding source note is determined as the source note having the same rank in the source list as the target note has in the target list. For example, once the source notes have been ranked, the ranked pitch information corresponding to the target note, and the ranked pitch information corresponding to the source note are matched based on the target list. In some embodiments, the highest pitch of the target note and the highest pitch of the source note are matched and the lowest pitch of the target note are matched. The pitch of the target note is then replaced with the matched pitch of the source note.
In some embodiments, the method further includes block 330 wherein for each of the sorted target notes, its corresponding source note is determined based on the determined probability distribution.
In some embodiments, block 310 includes block 314 wherein the replacing comprises determining a probability distribution of the plurality of source notes based on the pitch of each of the source notes.
In some embodiments, block 314 further comprises block 316 wherein the determining of the probability distribution is by means of a pre-trained model. For example, the pre-trained model uses machine learning such as neural networks. In some embodiments, the pre-trained model will select and replace the correct target notes with the source notes depending on a pre-defined or desired pitch. For example, a deep neural network (DNN) is trained to predict the pitches of all the notes in a MIDI file with degraded pitch information. This is done by inputting the MIDI files from which the pitch information is degraded. The DNN predicts a probability vector of dimension for each note with degraded pitch information (e.g., one vector for each MIDI pitch in the MIDI format). The error is then measured by adding the distance between the predicted probability vector and a one hot vector with 1 for the actual pitch of the note for each note (i.e., the pitch in the original MIDI file). The distance may be any vector distance (e.g., root-mean squared).
In some embodiments, the pre-trained model is trained on short fragments (e.g., ranging from 1 beat to 2 measures). In some embodiments, the degradation of the pitch information may follow several strategies. In some embodiments, the first strategy (Strategy 1) is to remove the pitch information (e.g., each note being represented by its start time, its end time, its velocity, and a vector of dimension 128 with 0 everywhere). In some embodiments, the second strategy (Strategy 2) is the replacement of the pitch information by a ranking information (e.g., the lowest pitch is replaced by value 0, the second lowest by value 1, and so on). In some embodiments, the third strategy (Strategy 3) is to “blur” the pitch information (e.g., The actual pitch is 0≤p≤127, this is replaced by a vector of dimension 128 with 0 everywhere except “near” p, e.g., the vector has 1 at indices p−5, . . . , p, . . . , p+5.
In some embodiments, the trained DNN is used to perform rhythm-harmony transfer between MIDI files. For example, given two MIDI fragments H and R of lengths similar to that used to train the DNN, an input T is created for the DNN by removing the pitch information from R following one of Strategy 1, Strategy 2, or Strategy 3. For each note with degraded pitch information in T, the DNN predicts a probability vector of dimension 128.
In some embodiments, block 310 further comprises block 318 wherein the replacing comprises if the plurality of source notes contains a higher number of notes than the plurality of target notes, removing, e.g. randomly, at least one source note from the plurality of source notes or adding at least one note, e.g. octave note, to the plurality of target notes such that the plurality of source notes contains the same number of notes as the plurality of target notes; or if the plurality of source notes contains a lower number of notes than the plurality of target notes, removing, e.g. randomly, at least one target note from the plurality of target notes or adding at least one note, e.g. octave note, to the plurality of source notes such that the plurality of source notes contains the same number of notes as the plurality of target notes. In some embodiments, the removal of the note is the lowest ranked note, or the highest ranked note. In another embodiment, the removal of the note is the farthest in proximity from a desired note.
Note that, in some embodiments, method 300 is performed for a plurality of portions of the source and target files. For example, method 300 is performed for temporally-aligned portions of the composition, such as beats or frames. Thus, the harmony for each frame of the target file is replaced by the harmony for a corresponding frame of the source file.
Embodiments of the present invention may be conveniently implemented using one or more conventional general purpose or specialized digital computer, computing device, machine, or microprocessor, including one or more processors, memory and/or computer readable storage media programmed according to the teachings of the present disclosure. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art. In some embodiments, the present invention includes a computer program product 82 which is a non-transitory storage medium or computer readable medium (media) having instructions 83 stored thereon/in, in the form of computer-executable components or software (SW), which can be used to program a computer to perform any of the methods/processes of the present invention.
The present disclosure has mainly been described above with reference to a few embodiments. However, as is readily appreciated by a person skilled in the art, other embodiments than the ones disclosed above are equally possible within the scope of the present disclosure, as defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
19210729 | Nov 2019 | EP | regional |
This application claims priority to U.S. Prov. App. No. 63/054,148, filed Jul. 20, 2020, and European App. No. EP19210729, filed Nov. 21, 2019, each of which is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5663517 | Oppenheim | Sep 1997 | A |
9286876 | Dabby | Mar 2016 | B1 |
20130125732 | Nguyen | May 2013 | A1 |
20190164446 | Humphrey | May 2019 | A1 |
20210125593 | Pachet | Apr 2021 | A1 |
Number | Date | Country |
---|---|---|
10205399 | May 2011 | CN |
110246472 | Sep 2019 | CN |
0143578 | Jun 1985 | EP |
3816989 | May 2021 | EP |
3826000 | May 2021 | EP |
Entry |
---|
Spotify AB, Extended European Search Report, EP19210729.0, dated Apr. 30, 2020, 26 pgs. |
Spotify AB, Extended European Search Report, EP21214833.2, dated Mar. 14, 2022, 31 pgs. |
Spotify AB, Intention to Grant, EP19210729.0, dated Aug. 16, 2021, 6 pgs. |
Spotify AB, Decision to Grant, EP19210729.0, dated Dec. 2, 2021, 2 pgs. |
Number | Date | Country | |
---|---|---|---|
20210158791 A1 | May 2021 | US |
Number | Date | Country | |
---|---|---|---|
63054148 | Jul 2020 | US |