This application claims priority to European Patent Application No. 19160593, filed Mar. 4, 2019, which is hereby incorporated by reference in its entirety.
The present disclosure relates to a method and an editor for editing an audio file.
Music performance can be represented in various ways, depending on the context of use: printed notation, such as scores or lead sheets, audio signals, or performance acquisition data, such as piano-rolls or Musical Instrument Digital Interface (MIDI) files. Each of these representations captures partial information about the music that is useful in certain contexts, with its own limitations. Printed notation offers information about the musical meaning of a piece, with explicit note names and chord labels (in, e.g., lead sheets), and precise metrical and structural information, but it tells little about the sound. Audio recordings render timbre and expression accurately, but provide no information about the score. Symbolic representations of musical performance, such as MIDI, provide precise timings and are therefore well adapted to edit operations, either by humans or by software.
A need for editing musical performance data may arise from two situations. First, musicians often need to edit performance data when producing a new piece of music. For instance, a jazz pianist may play an improvised version of a song, but this improvisation should be edited to accommodate for a posteriori changes in the structure of the song. The second need comes from the rise of Artificial Intelligence (AI)-based automatic music generation tools. These tools may usually work by analysing existing human performance data to produce new ones. Whatever the algorithm used for learning and generating music, these tools call for editing means that preserve as far as possible the expressiveness of original sources.
However, editing music performance data raises special issues related to the ambiguous nature of musical objects. A first source of ambiguity may be that musicians produce many temporal deviations from the metrical frame. These deviations may be intentional or subconscious, but they may play an important part in conveying the groove or feeling of a performance. Relations between musical elements are also usually implicit, creating even more ambiguity. A note is in relation with the surrounding notes in many possible ways, e.g. it can be part of a melodic pattern, and it can also play a harmonic role with other simultaneous notes, or be a pedal-tone. All these aspects, although not explicitly represented, may play an essential role that should preferably be preserved, as much as possible, when editing such musical sequences.
The MIDI file format has been successful in the instrument industry and in music research and MIDI editors are known, for instance in Digital Audio Workstations. However, the problem of editing MIDI with semantic-preserving operations has not previously been addressed. Attempts to provide semantically-preserving edit operations have been made on the audio domain (e.g. by Whittaker, S., and Amento, B. “Semantic speech editing”, in Proceedings of the SIGCHI conference on Human factors in computing systems (2004), ACM, pp. 527-534) but these attempts are not transferrable to music performance data, as explained below.
In human-computer interactions, cut, copy and paste are the so called holy trinity of data manipulation. These three commands have proved so useful that they are now incorporated in almost every software, such as word processing, programming environments, graphics creation, photography, audio signal, or movie editing tools. Recently, they have been extended to run across devices, enabling moving text or media from, for instance, a smartphone to a computer. These operations are simple and have clear, unambiguous semantics: cut, for instance, consists in selecting some data, say a word in a text, removing it from the text, and saving it to a clipboard for later use.
Each type of data to be edited raises its own editing issues that have led to the development of specific editing techniques. For instance, editing of audio signals usually requires cross fades to prevent clicks. Similarly, in movie editing, fade-in and fade-out are used to prevent harsh transitions in the image flow. Edge detection algorithms were developed to simplify object selection in image editing. The case of MIDI data is no exception. Every note in a musical work is related to the preceding, succeeding, and simultaneous notes in the piece. Moreover, every note is related to the metrical structure of the music.
It is an objective of the present disclosure to address the issue of editing musical performance data represented as an editable audio file, e.g. MIDI, while preserving as much as possible its semantic.
According to an aspect of the present disclosure, there is provided a method for editing an audio file. The audio file comprises information about a time stream having a plurality of tones extending over time in said stream. The method comprises cutting the stream at a first time point of the stream, producing a first cut having a first left cutting end and a first right cutting end. The method also comprises allocating a respective memory cell to each of the first cutting ends. The method also comprises, in each of the memory cells, storing information about those of the plurality of tones which extend to the cutting end to which the memory cell is allocated. The method also comprises, for each of at least one of the first cutting ends, concatenating the cutting end with a further stream cutting end which has an allocated memory cell with information stored therein about those tones which extend to said further cutting end. The concatenating comprises using the information stored in the memory cells of the first cutting end and the further cutting end for adjusting any of the tones extending to the first cutting end and the further cutting end.
The method aspect may e.g. be performed by an audio editor running on a dedicated or general purpose computer.
According to another aspect of the present disclosure, there is provided a computer program product comprising computer-executable components for causing an audio editor to perform the method of any preceding claim when the computer-executable components are run on processing circuitry comprised in the audio editor.
According to another aspect of the present disclosure, there is provided an audio editor configured for editing an audio file. The audio file comprises information about a time stream having a plurality of tones extending over time in said stream. The audio editor comprises processing circuitry, and data storage storing instructions executable by said processing circuitry whereby said audio editor is operative to cut the stream at a first time point of the stream, producing a first cut having a first left cutting end and a first right cutting end. The audio editor is also operative to allocate a respective memory cell of the data storage to each of the first cutting ends. The audio editor is also operative to, in each of the memory cells, store information about those of the plurality of tones which extend to the cutting end to which the memory cell is allocated. The audio editor is also operative to, for each of at least one of the first cutting ends, concatenating the cutting end with a further stream cutting end which has an allocated memory cell of the data storage with information stored therein about those tones which extend to the further cutting end. The concatenating comprises using the information stored in the memory cells of the first cutting end and the further cutting end for adjusting any of the tones extending to the first cutting end and the further cutting end.
Further, some embodiments of the present disclosure provide a system for editing an audio file, the audio file comprising information about a time stream having a plurality of tones extending over time in said time stream, the system comprising: one or more processors; and memory storing one or more programs, the one or more programs including instructions, which, when executed by the one or more processors, cause the one or more processors to perform any of the methods described herein.
Further, some embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing one or more programs for editing an audio file, the audio file comprising information about a time stream having a plurality of tones extending over time in said time stream, wherein the one or more programs include instructions, which, when executed by a system with one or more processors, cause the system to perform any of the methods described herein.
It is to be noted that any feature of any of the aspects may be applied to any other aspect, wherever appropriate. Likewise, any advantage of any of the aspects may apply to any of the other aspects. Other objectives, features and advantages of the enclosed embodiments will be apparent from the following detailed disclosure, from the attached dependent claims as well as from the drawings.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to “a/an/the element, apparatus, component, means, step, etc.” are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated. The use of “first”, “second” etc. for different features/components of the present disclosure are only intended to distinguish the features/components from other similar features/components and not to impart any order or hierarchy to the features/components.
Embodiments will be described, by way of example, with reference to the accompanying drawings, in which:
Embodiments will now be described more fully hereinafter with reference to the accompanying drawings, in which certain embodiments are shown. However, other embodiments in many different forms are possible within the scope of the present disclosure. Rather, the following embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. Like numbers refer to like elements throughout the description.
Herein, the problem of editing non-quantized, metrical musical sequences represented as e.g. MIDI files is discussed. A number of problems caused by the use of naive edition operations applied to performance data are presented using a motivating example of
An edit operation is illustrated, in which two beats of a measure, between a first time point tA and a second time point tB (illustrated by dashed lines in the figure) are cut out and inserted in a later measure of the stream, in a cut a third time point tC. To perform the edit operation, three cuts A, B and C are made at the first, second and third time points tA, tB and tC, respectively. The first cut A produces a first left cutting end AL and a first right cutting end AR. The second cut B produces a second left cutting end BL and a second right cutting end BR. The third cut C produces a third left cutting end CL and a third right cutting end CR.
In
In the stream S of
Cut, copy, and paste operations may be performed using two basic primitives: split and concatenate. The split primitive is used to separate an audio stream S (or MIDI file) at a specified temporal position, e.g. time point tA, yielding two streams (see e.g. streams S1 and S2 of
1. Cut sequence S at time point tB, which returns streams S1 and S2.
2. Cut the second sequence S2 at time point tA, which returns streams S3 and S4, S4 corresponding to the section between time points tA and tB.
3. Store sequence S4 to a digital clipboard.
4. Return the concatenation of S3 and S2.
Similarly, to insert a stream, e.g. stored stream S4 (as above), in a stream S at time point tC, one may:
1. Cut the stream S at time point tC, producing two streams S1 (duration of S prior to tC) and S2 (duration of S after tC), not identical to S1 and S2 discussed above.
2. Return the concatenation of S1, S4, and S2, in this order.
In the first case, none of the first and second tones T1 and T2 extend to the cut A, resulting in both left and right memory cells being empty, indicated as (0,0).
In the second case, the first tone T1 touches the left cutting end AL, resulting in information about said first tone T1 being stored in the left memory cell as (12,0) indicating that the first tone extends 12 units of time to the left of the cut A but no time unit to the right of the cut A. None of the first and second tones T1 and T2 extends to the right cutting end AR (i.e. none of the tones extends to the cut A from the right of the cut), why the right memory cell is empty.
Conversely, in the third case, the second tone T2 touches the right cutting end AR, resulting in information about said second tone T2 being stored in the right memory cell as (0,5) indicating that the second tone extends 5 units of time to the right of the cut A but no time unit to the left of the cut A. None of the first and second tones T1 and T2 extends to the left cutting end AL (i.e. none of the tones extends to the cut A from the left of the cut), why the left memory cell is empty.
In the fourth case, both of the first and second tones T1 and T2 touch respective cutting ends AL and AR (i.e. both tones ends at tA, without overlapping in time). Thus, information about the first tone T1 is stored in the left memory cell as (12,0) indicating that the first tone extends 12 units of time to the left of the cut A but no time unit to the right of the cut A, and information about the second tone T2 is stored in the right memory cell as (0,5) indicating that the second tone extends 5 units of time to the right of the cut A but no time unit to the left of the cut A.
In the fifth case, a single (first) tone T1 is shown extending across the cutting time tA and thus being divided in two parts by the cut A. Thus, information about the first tone T1 is stored in the left memory cell as (5,12) indicating that the first tone extends 5 units of time to the left of the cut A and 12 time units to the right of the cut A, and information about the same first tone T1 is stored in the right memory cell, also as (5,12) indicating that the first tone extends 5 units of time to the left of the cut A and 12 time units to the right of the cut A.
As discussed herein, the information stored in the respective memory cells may be used for determining how to handle the tones extending to the cut A when concatenating either of the left and right cutting ends with another cutting end (of the same stream S or of another stream). In accordance with embodiments of the present disclosure, a tone extending to a cutting end can, after concatenating with another cutting end, be adjusted based on the information about the tone stored in the memory cell of the cutting end.
Examples of such adjusting includes:
Regarding removal of fragments, in some embodiments, two different duration thresholds may be used, e.g. an upper threshold and a lower threshold. In that case, if the duration of a part of a tone T which is created after making a cut A is below the lower threshold, the part is regarded as a fragment and removed from the audio stream, regardless of its percentage of the original tone duration. On the other hand, if the duration of the part of the tone T which is created after making a cut A is above the upper threshold, the part is kept in the audio stream, regardless of its percentage of the original tone duration. However, if the duration of the part of the tone T which is created after making a cut A is between the upper and lower duration thresholds, whether it is kept or removed may depend on its percentage of the original tone duration, e.g. whether it is above or below a percentage threshold. This may be used e.g. to avoid removal of long tone parts just because they are below a percentage threshold.
In
In
In
In some embodiments of the present disclosure, the audio file 10 is in accordance with a MIDI file format, which is a well-known editable audio format.
In some embodiments of the present disclosure, the further cutting end BR or CR, or BL or CL is from the same time stream S as the first cutting end AL or AR, e.g. when cutting and pasting within the same stream S. In some embodiments, the further cutting end is a second left or right cutting end BL or BR, or CL or CR of a second cut B or C produced by cutting the stream S at a second time point tB or tC in the stream. In some embodiments, the at least one of the first cutting ends is the first left cutting edge AL and the further cutting end is the second right cutting edge BR or CR.
In some other embodiments of the present disclosure, the further cutting end BR or CR, or BL or CL is from another time stream than the time stream S of the first cutting end AL or AR, e.g. when cutting from one stream and inserting in another stream.
In some embodiments of the present disclosure, the adjusting comprises any of: removing a fragment of a tone T; extending a tone over the cutting ends AL or AR; and BR or CR, or BL or CL; and merging a tone extending to the first cutting end AL or AR with a tone extending to the further cutting end BR or CR, or BL or CL (e.g. handling splits and quantized issues).
Embodiments of the present disclosure may be conveniently implemented using one or more conventional general purpose or specialized digital computer, computing device, machine, or microprocessor, including one or more processors, memory and/or computer readable storage media programmed according to the teachings of the present disclosure. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art.
In some embodiments, the present disclosure provides a computer program product 3 which is a non-transitory storage medium or computer readable medium (media) having instructions 4 stored thereon/in, in the form of computer-executable components or software (SW), which can be used to program a computer 1 to perform any of the methods/processes of the present disclosure. Examples of the storage medium can include, but is not limited to, any type of disk including floppy disks, optical discs, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
According to a more general aspect of the present disclosure, there is provided a method of editing an audio stream (S) having at least one tone T extending over time in said stream. The method comprises cutting M1 the stream at a first time point to of the stream, producing a first cut A having a left cutting end AL and a right cutting end AR. The method also comprises allocating M2 a respective memory cell 5 to each of the cutting ends. The method also comprises, in each of the memory cells, storing M3 information about the tone T. The method also comprises, for one of the cutting ends AL or AR, concatenating M4 the cutting end with a further cutting end BR or CR, or BL or CL which also has an allocated memory cell 5 with information stored therein about any tones T extending to said further cutting end. The concatenating M4 comprises using the information stored in the memory cells 5 for adjusting any of the tones T extending to the cutting ends AL or AR, and BR or CR or BL or CL.
The present disclosure has mainly been described above with reference to a few embodiments. However, as is readily appreciated by a person skilled in the art, other embodiments than the ones disclosed above are equally possible within the scope of the present disclosure, as defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
19160593.0 | Mar 2019 | EP | regional |