1. Field of the Disclosure
The present disclosure relates to a device and a corresponding method for generating a real time music accompaniment, in particular for playing multi-modal music, i.e. enable the playing of music in multiple modes. Further, the present disclosures relates to a device and a corresponding method for recording pieces of music for use in generating a real time music accompaniment. Still further, the present disclosure relates to a device and a corresponding method for generating a real time music accompaniment using a transformation of chords.
2. Description of Related Art
Known devices and methods for generating a real time music accompaniment make e.g. use of so-called “loop pedals” (also called “looping pedals”). Loop pedals are real-time samplers that playback audio played previously by a musician. Such pedals are routinely used for music practice or outdoor “busking”, i.e. generally for generating a real time music accompaniment. However, the known loop pedals always play back the same material, which may make performances monotonous and boring both to the musician and the audience, thereby preventing their uptake in professional concerts.
Further, standard loop pedals often force the musician to play the entire loop once during a “feeding phase” before starting to improvise on top of it, i.e. while the loop will be repeated. This can be repetitive when the chord grid is to be played in a stylistically consistent manner (which is most of the time the case). Further, this can be a problem when the loop is played on top of a given chord sequence (or chord grid), because the musician cannot start improvising until the whole grid has been played. Another approach is to pre-record loops. This raises another issue as the audience will not know what is pre-recorded and what is actually performed by the musician. This is a general shortcoming of computer-assisted music performance
The “background” description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventor(s), to the extent it is described in this background section, as well as aspects of the description which may not otherwise qualify as prior art at the time of filing, are neither expressly or impliedly admitted as prior art against the present disclosure.
It is an object to provide a device and a corresponding method for generating an improved real time music accompaniment. It is a further object to provide a device and a corresponding method for recording pieces of music for use in generating a real time music accompaniment. It is still a further object to provide a corresponding computer program for implementing said methods and a non-transitory computer-readable recording medium.
According to an aspect there is provided a device for generating a real time music accompaniment, said device comprising
According to a further aspect there is provided a corresponding method for generating a real time music accompaniment, said method comprising
According to a further aspect there is provided a device for recording pieces of music for use in generating a real time music accompaniment, said device comprising
According to a further aspect there is provided a corresponding method for recording pieces of music for use in generating a real time music accompaniment, said method comprising
According to still another aspect there is provided a device for generating a real time music accompaniment, said device comprising
According to a further aspect there is provided a corresponding method for generating a real time music accompaniment, said method comprising
According to still further aspects a computer program comprising program means for causing a computer to carry out the steps of the method disclosed herein, when said computer program is carried out on a computer, as well as a non-transitory computer-readable recording medium that stores therein a computer program product, which, when executed by a processor, causes the method disclosed herein to be performed are provided.
Preferred embodiments are defined in the dependent claims. It shall be understood that the claimed method, the claimed computer program and the claimed computer-readable recording medium have similar and/or identical preferred embodiments as the claimed device and as defined in the dependent claims.
One of the aspects of the disclosure is to apply a new approach, e.g. to loop pedals, which is based on an analytical multi-modal representation of the music (audio) input. Instead of simply playing back pre-recorded audio, the proposed device and method enable real-time generation of an audio accompaniment reacting to what is being performed by the musician. By combining two or more music modes automatically, solo musicians can perform duets or trios with themselves, without engendering canned music effects. Accordingly, a supervised classification of input music and, preferably, a concate-native synthesis are performed. This approach opens up new avenues for concert performance.
Another aspect of the disclosure is to enable musicians to quickly feed a loop without having to play it entirely. This is achieved by providing the chord grid and implementing a mechanism that reuses already played bars or chords using e.g. pitch scaling techniques, i.e. to make a transformation (in particular a transposition and/or substitution) of the audio signal, and/or chord substitution rules. Thus, the loop (or, more generally, the real time music accompaniment) is generated from a limited amount of music material, typically a bar or a few bars. Preferably, the “cost” of the transformation is minimized to ensure the greatest quality of the played signal.
Further, the disclosed device and method generate an improved real time music accompaniment that make performances by use of such a device or method less monotonous and boring both to the musician and the audience and that make the performances fully understandable by the audience as generally nothing is pre-recorded.
In this context it shall be understood that a piece of music does not necessarily mean a complete song or tune, but generally means one or more chords or beats. The device and method for generating a real time music accompaniment are generally directed to the generation of the accompaniment during a playback phase (or state), i.e. when a musician wants to be accompanied while he is playing. The device and method for recording pieces of music for use in generating a real time music accompaniment are generally directed to the recording of music during a recording phase (or state) that can later be used in a playback phase.
Further, it shall be noted that a chord is generally associated to each “temporal position” in the grid, e.g., a measure, or a beat. A performance is a walk through the sequence of chords. When the musician plays something during a performance, it is systematically associated to the corresponding chord. Thus, chords may generally be three different things, namely a position in the grid, an information on the harmony, and a physical chord played on a musical instrument.
It is to be understood that both the foregoing general description and the following detailed description are exemplary, but are not restrictive of the disclosure.
A more complete appreciation of the disclosure and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:
Solo improvised performance is arguably the most challenging situation for a musician, in particular for jazz. The main reason is that in order to produce an interesting musical discourse, many dimensions of music should be performed simultaneously, such as beat, harmony, bass and melody. A solo musician should incarnate the roles of a whole rhythm section, like in a standard jazz combo such as piano, bass and drums. Additionally, they should improvise a solo while maintaining the rhythm section. Technically this is possible only for few instruments like the piano, but even in that case it requires great virtuosity. For guitars, solo performance is even more challenging as the configuration of the instrument does not allow for multiple simultaneous music streams. In the 80s, virtuoso guitarist Stanley Jordan stunned the musical world by playing simultaneously bass, chords and melodies using a technique called “tapping”. But such techniques are hard to master, and the resulting music, while exciting, is arguably stereotyped.
Several technologies are known to cope with the limitations of solo performers by aiming to extend their expressiveness. One of the most popular is the loop pedal as described in Boss, RC-3 Loop Station Owner's Manual, (2011). Loop pedals are digital samplers that record a music input during a certain time frame, determined by clicking on the pedal.
With such loop pedals the musician typically first records a sequence of chords (or a bass line) and then improvises on top of it. This scheme can be extended to stack up several layers (e.g. chords then bass) using other interactive widgets (e.g. double clicking on the pedal). Loop pedals enable musicians to literally play two (or more) tracks of music in real-time. However, they invariably produce a canned music effect due to the systematic repetition of the recorded loop without any variation whatsoever.
Another popular and inspiring device for enabling solo performance is the minus-one recording, such as the Aebersold series as described in Aebersold, J., How To Play Jazz & Improvise, Book & CD Set, Vol. 1, (2000). With these recordings, the musician is able to play a tune with a fully-fledged professional rhythm section. Though of a different nature, the canned effect is still there: playing with a recording generates stylistic mismatch. Stylistic consistency is lost, as it is no longer only the musician playing, but other, invisible musicians, which eliminates the interactive nature of real-time improvisation and lessens the musical impact on the audience. Consequently, these devices are hardly used in concerts or recordings, and their usage remains limited to practice or busking (low-profile outdoor playing).
Previous works have attempted to extend traditional instruments, such as the guitar, by using real-time signal analysis and synthesis. For example, Lähdeoja, O., An approach to instrument augmentation: the electric guitar, Proc. New Interfaces for Musical Expression conference, NIME (2008) showed how to detect fine-grained playing modes from the analysis of the incoming guitar signal, and Reboursière, L. Frisson, C. Lähdeoja, O. Anderson, J. Iii, M. Picard, C. Todoroff, T., Multimodal Guitar: A Toolbox For Augmented Guitar Performances, Proc. of NIME, (2010) proposed a rearranging loop pedal that detects and reshuffles randomly note events within a loop. In Hamanaka, M. Goto, M. Asoh, H. and N. Otsu, A learning-based jam session system that imitates a player's personality model. IJCAI, pp. 51-58, (2003) a MIDI-based model of an improviser's personality is proposed, to build a virtual trio system, but it is not clear how it can be used in realistic performance scenarios requiring a predetermined harmony and tempo. Finally, Cherla, S., Automatic Phrase Continuation from Guitar and Bass-guitar Melodies, Master thesis, UPF, (2011) proposes an audio-based method for generating stylistically consistent phrases from a guitar or bass but this applies only to monophonic melodies. Omax is a system for live improvisation that plays musical sequences built incrementally and in real-time from a live MIDI or Audio source as described in Lévy, B., Bloch, G., Assayag, G., OMaxist Dialectics: Capturing, Visualizing and Expanding Improvisations, Proc. NIME 2012, Ann Arbor, 2012. Omax uses feature similarity and concatenative synthesis to build clones of the musician, thus extending the instrument by creating rich textures by superimposing the musician's input with the clones. This makes this approach suitable for free musical improvisation. Although reflexive loop pedals bear many technical similarities with Omax, they are intended for traditional (solo) jazz improvisation involving harmonic and temporal constraints as well as combining heterogeneous instruments and/or modes of playing, as will be explained below.
Observing real jazz combos (duos or trios) gives clues to what a natural extension of a jazz instrument could be. In a jazz duo for instance, musicians typically alternate between comping (providing harmony with chords) and solo (e.g. melodies). Each musician also adapts in a mimetic way to the other(s), for instance in terms of energy, pitch or note density. Based on these observations, so-called Reflexive Loop Pedals are proposed representing a novel approach to loop pedals that enables musicians to expand their musical competence as if they were playing in a duo or trio with themselves, but which avoids the canned music effect of pedals or minus-one recordings. This is achieved by enforcing stylistic consistency (no external pre-recorded material is used) while allowing natural interaction between the human and the played back material. In the following the proposed approach will be explained in more detail and a solo guitar performance will be described as an example.
Such a device is referred to as reflexive loop pedal (RLPs) herein. RLPs follow the same basic principle as standard loop pedals: they play back music material performed previously by the musician. RLPs differ in at least one aspect: RLPs manage to differentiate between several playing modes, such as bass, harmony (chords) and solo melodies. Depending on the mode the musician is playing at any point in time, the device will play differently, following the “other members” principle. For instance, if the musician plays a solo, the RLP will play bass and/or chords. If the musician plays chords, the RLP will play bass and/or solo, etc. This rule ensures that the overall performance is close to a natural music combo, where in most cases bass, chords and solo are always present but never overlap.
In a preferred embodiment the playback material is determined not only according to the current position in the loop, but also to a predetermined chord grid and/or to the current playing of the musician, in particular through feature-based similarity. This ensures that any generated accompaniment actually follows the musician's playing. A corresponding second embodiment of a device 30 for generating a real time music accompaniment according to the present disclosure is schematically shown in
Preferably, said music analyzer 31 is configured to obtain a music piece description comprising one or more of pitch, bar, key, tempo, distribution of energy, average energy, peaks of energy, number of peaks, spectral centroid, energy level, style, chords, volume, density of notes, number of notes, mean pitch, mean interval, highest pitch, lowest pitch, pitch variety, harmony duration, melody duration, interval duration, chord symbols, scales, chord extensions, relative roots, zone, type of instrument(s) and tune of an analyzed piece of music.
Optionally, as shown in
In a preferred implementation, input music is received both as an audio and a MIDI stream. Accordingly, the music input interface 21 preferably comprises a midi interface 21a and/or an audio interface 21b for receiving said pieces of music in midi format and/or in audio format as also shown in
Like in many jazz accompaniment systems, a chord grid is provided a priori as explained above and as illustrated in the following table.
Said table shows a typical chord grid. Some chords are repeated (e.g. here, C min and F maj7), providing more choice for the device and method during generation of the accompaniment. The chord grid is preferably used to label each played beat with the corresponding chord. A preferred constraint imposed to RLPs is that each played-back audio segment should correspond to the correct chord in the chord grid. A grid often contains several occurrences of the same chord which enables the device to reuse a given recording for a chord several times, which increases its ability to adapt to the current playing of the musician.
Further, in still another implementation a tempo is preferably provided as well, e.g. via an optionally provided tempo interface 33 (also shown in
In the following an embodiment will be described how the device and method can automatically classify the musician's input into the different music modes. In this context, musically meaningful macro modes are considered, corresponding to different musical intentions, such as bass, chords and solo. Particularly the mode classification for guitar will be considered, but this applies to other instruments, e.g. the piano, with the same performance.
In this exemplary embodiment a corpus of 8 standard jazz tunes in various tempos and feels (e.g. Bluesette, Lady Bird, Nardis, Ornitholo-gy, Solar, Summer Samba, The Days of Wine and Roses, and Tune up) is built. For each tune, three guitar performances of the same duration (about 4′) were recorded: one with bass, one with chords, and one with solos, by playing e.g. along with an Aebersold minus-one recording. For each performance both audio and MIDI (e.g. using a Godin MIDI guitar) were recorded, for a total of 5,418 bars. The MIDI input is segmented into one-bar ‘chunks’, at the given tempo. Chunks are not synchronized to the beat, to ensure that the resulting classifier is robust, i.e. is able to readily classify any musical input, including ones that are out of time, which is a common technique used in jazz.
One tune (e.g. Bluesette) was to perform feature selection. The initial feature set contains 20 MIDI features related to pitch, duration, velocity, and statistical moments thereof, and three specific bar structure features: harmony-dur, melody-dur, interval-dur (dur meaning duration here)as shown in
In particular,
A Support Vector Machine classifier (e.g. Weka's SMO) is preferably used and trained on the labeled data with the selected features. The following table shows the performance of an SVM (Support Vector Machine, which is a standard machine-learning) classifier on each individual tune measured with a 10-fold cross-validation with a normalized poly-kernel. Last row shows the performance of the classifier trained on all 8 tunes. As indicated in said table classification results are near perfect, ensuring robust mode identification during performance.
During performance, audio streams are preferably generated using concatenative synthesis from audio material previously played and classified. Generation is done according to two principles.
The first principle is called “the other members principle”. The currently played music is analyzed by the mode classifier, which determines the two other music modes to generate (e.g. bass=>chords & solo, chords=>bass & solo, solo=>bass & chords). In case no previously played bar is yet available, the generation outputs silence.
The second principle is called “feature-based interaction”. According to an aspect the proposed device and method do not simply play back a recorded sequence, but generate a new one, adapted to the current real-time performance of the musician. This is preferably achieved using feature-based similarity (in particular using a music piece description as explained above). Audio features from the user's input music are extracted. For instance, in an implementation the user features are RMS (mean energy of the bar), hit count (number of peaks in the signal) and spectral centroid, though other MPEG-7 features could be used (see, e.g., Peeters, G., A large set of audio features for sound description (similarity and classification) in the CUIDADO project, Ircam Report (2000)). The device and method attempt to find and play back recorded bars of the right modes (say, chords and bass if the user is playing melody), correct grid chord (say, C min), and that best match the user features. Feature matching is preferably performed using Euclidean distance.
Audio generation is preferably performed using concatenative synthesis as e.g. described in Schwarz, D., Current research in Concatenative Sound Synthesis, Proc. Int. Computer Music Conf. (2005). Thus, audio beats are concatenated in the time domain and crossfaded to avoid audio clicks.
The proposed approach is proven with a solo guitar performance with the system on the tune “Solar” by Miles Davis. During this 2′50″ performance, the 12-bar tune is played 9 times. The musician played alternatively chords, solos, and bass, and the device and method reacted according to the 2 “other members” principle. Moreover, the device and method generated an accompaniment that matches the overall energy of the musician: soft passages are accompanied with low-intensity bass lines (i.e., bass lines with few notes as the hit count user feature is considered), and with low-energy harmonic bars (i.e., with soft chords, as user feature RMS is considered), and conversely.
A third, more general embodiment of a device 70 for generating a real time music accompaniment according to the present disclosure is shown in
Optionally, as shown in dashed lines, the device 70 further comprises a music exchange interface 71 that is configured to record pieces of music received at said music input interface 21 along with its classified music mode in an external music memory 72, e.g. an external hard disk, computer storage or other memory provided external to the device (for instance, storage space provided in a cloud or the internet). The music selector 26 is configured accordingly to select, via said music exchange interface 71, one or more pieces of music from the pieces of music recorded in said external music memory 72 as real time music accompaniment.
The above explained embodiments of the device and method mainly relate to the playback phase or to both the recording phase and the playback phase. According to another aspect the present disclosure also relates to a device and a corresponding method for recording pieces of music for use in generating a real time music accompaniment, i.e. said device and method relating to the recording phase only. An embodiment of such a device 80 is shown in
The recorder 83 can be implemented as music storage like e.g. the music storage 23 or may be configured to directly record on such a music storage. In another embodiment the recorder 83 can be implemented as music exchange interface like e.g. the music exchange interface 71 to record on an external music memory.
The above described device and method address two critical problems of existing music extension devices, namely lack of adaptiveness (loop pedals are too repetitive) and stylistic mismatch (playing along with minus-one recordings generates stylistic inconsistency). The above described approach is based on a multi-modal analysis of solo performance that preferably classifies every incoming bar automatically into one of a given set of music modes (e.g. bass, chords, solo). An audio accompaniment is generated that best matches what is currently being performed by the musician, preferably using feature matching and mode identification, which brings adaptiveness. Further, it consists exclusively of bars the user played previously in the performance, which ensures stylistic consistency.
As a consequence, a solo performer can perform as a jazz trio, interacting with themselves on any chord grid, providing a strong sense of musical cohesion, and without creating a canned music effect.
The new kind of interaction described above with regard to
A preferred implementation uses a MIDI stream for mode classification. MIDI is available from synthesizers, some pianos or guitars, but not all instruments. Current work addresses the identification of robust audio features required to perform mode classification directly from the audio signal. This will generalize the approach to any instrument. In another embodiment there is a MIDI implementation and an AUDIO implementation. These two implementations are exclusive.
A schematic block diagram of an embodiment of a device 90 according to the present disclosure is shown in
The device 90 according to this embodiment allows generating a loop from a limited amount of music material, typically a bar or a few bars. Thus, in effect, a new form of loop pedal is proposed, which is targeted at situations in which the chord grid is known in advance. In that case, the chord grid is specified to the pedal (i.e. the device) through the chord grid interface (e.g. through any GUI, or by selecting from a library of chord grids, etc.). A typical example for a chord grid is a blues, e.g. “C7|C7|C7|C7|F7|F7|C7|C7|G7|F7|C7|C7|” (or something like that).
The idea is that instead of playing the whole loop entirely, the “enhanced pedal” now only needs to record the first bar (or chord) or the first bars (or chords), for instance a C7 chord, played in whatever style. Thus, a musician actually plays only one or more chords, and these played bar(s) or chord(s) is (are) then transformed digitally, for instance using known pitch scaling algorithms, in this example in F7 and G7. As a consequence, the user can start improvising right away after the first bar(s) or chord(s), i.e. much faster than with known loop pedals. While generally the at least one played chord is played live and in real-time, it is generally possible that the at least one chord is played and recorded in advance and is, for the generating the actual accompaniment, received as pre-recorded input, e.g. via a data interface or microphone.
Several problems generally occur when pitch scaling an audio signal: the frequency bins in the original signal shall not change, the phase of the output shall be coherent, and the transients shall not be stretched. Phase vocoding is an algorithm that uses Short Time Fourier Transform (STFT) and Overlap-And-Add (OLA), and recalculates the phase of the signal. As a drawback, the phase vocoder degrades (smears) the transients and adds a reverberation effect to the output. SOLA (Synchronous Overlap-And-Add) improves phase vocoding by synchronizing the analysis/synthesis frames used for OLA with the fundamental frequency of the signal. Its efficiency depends on the type of the input signal, and complex sounds will be harder to scale (monophonic sounds will be easier to scale). Another method uses granular (re)synthesis, coupled with a transient detector, to leave the transients un-stretched (the recent IRCAM's Mach Five uses this technology). In the case of speech, other algorithms show very good results, such as linear prediction based vocoders or the PSOLA algorithm (Pitch Synchronous Overlap and Add).
Thus, an algorithm is preferably used by the music generator 93 that generates a sequence of audio accompaniment, given an a priori chord grid, and partial audio chunks, corresponding to some of the chords of the sequence. In practice, the musician can lay only the first one or more bars, or, during his performance, play other bars anywhere in the chord grid (played in loop). The algorithm generates an audio accompaniment given these incomplete audio inputs. In an embodiment the output of this algorithm is constantly updated (e.g. at every bar).
Preferably, the algorithm tries to minimize the number of transformations and their range (it is better to transpose as little as possible to minimize artefacts) in the generated audio accompaniment. A transformation generally is a substitution, a transposition or a combination of a substitution with a transposition. The “range” refers to the transposition, and the range of a transposition is the frequency ratio between the original frequency and the transposed frequency. For a small change in frequency, e.g., transpositions of one semitone, the audio quality is almost perfect; for larger changes in frequency, e.g., transposition by a fifth, the audio quality is degraded. The use of a substitution may create an odd feeling (what is played does not necessarily match perfectly the expected harmony . . . ). Therefore, the aim of the disclosed approach is to minimize the number of transformations to avoid both “odd harmonies” due to substitutions and “audio degradations” due to transpositions.
Moreover, the algorithm can use “chord substitutions” to avoid transpositions when possible. For instance, instead of a C major seven, one could use a E minor, etc. A complete list of substitution is given in
In an embodiment the music interface 92 comprises a start-stop interface for starting and stopping the reception and/or recording of chords played by a musician. Said start-stop interface may e.g. comprise a pedal. Further, in an embodiment said chord interface 91 is a user interface for entering a chord grid and/or selecting a chord grid from a chord grid database. Further, a music output interface, e. g. a loudspeaker, may be provided that is able to output the generated music accompaniment.
Generally, the musician (or someone else) decides in advance which chord progression to follow and which chords to play. In another embodiment, however, a unit configured to receive audio input and classify it as a certain chord of the chord grid is provided. Further, in an embodiment a unit for storing received and generated music may be provided.
In the following a description of a generalizing harmonization device will be provided in the context of improvized tonal music, such as, but not limited to, bebop jazz. In this context, a chord progression (also referred to as “chord grid” herein) is decided before starting the actual improvization, maybe by selection from a list displayed in a corresponding user interface The chord progression defines that harmony of each bar of the tune. During the improvization one or several musicians play together following the harmonies specified by the chord progression. Typically, one of the musicians plays an accompaniment, for instance chords, while another one simultaneously plays a solo melody, in the same harmony.
Generally, a harmonization device generates an accompaniment for one or more musicians improvising on a predefined chord progression. The accompaniment fits the harmonic structure of the corresponding bar in the chord progression. A harmonization device can, for instance, synthesize a chord using a MIDI synthesizer, or play back pre-recorded music.
In this context it is conventionally dealt with pre-recorded music. The device takes two inputs: i) a chord database D of pre-recorded bars, each bar having a specific harmonic structure, and ii) a chord progression P. A known device outputs a musical accompaniment comprising a sequence of pre-recorded bars of D. The accompaniment is meant to be played back during an improvization. Note that tempo issues are neglected herein. Further, it shall be assumed that the musical bars in the database D are preferably recorded at the same tempo as that of the improvization.
The following table gives the chord progression of a simple blues. Each bar contains one chord, but some progressions typically specify 1, 2, 3, or 4 chords per bar.
If a database a Dblues that contains five bars b1, . . . , b5 is considered with respective harmonic structure C7, F7, A7, D min 7, and G7, a simple harmonization device will play back the sequence of bars: b1, b2, b1, b1, b2, b2, b1, b3, b4, b5, b1, b5 during the improvization. In this case, the database Dblues is said to be complete with respect to the chord progression Pblues, as for every chord in Pblues there is a corresponding bar in a Dblues.
If an incomplete database D′blues consisting of three bars b1, b2, b3 with respective harmonic structure C7, F7, and G7 is considered, a simple harmonization device will play back the sequence of bars: b1, b2, b1, b1, b2, b2, b1, -, -, b3, b1, b3 during the improvization. In this sequence “-” means that nothing is played back.
In this case, the database D′blues is said to be incomplete with respect the chord progression Pblues, as not for every chord in Pblues there is a corresponding bar in D′blues.
The disclosed Generalizing Harmonization Device (GHD) aims at generalizing the simple harmonization device presented above to incomplete databases. A GHD uses chord substitution rules and/or chord transposition mechanisms, as explained herein, to generate accompaniments from incomplete databases. A chord c, in this context, consists of a root pitch-class and a harmonic type. This is written as c=(r; t). For instance, chord C7 has root note r(C)=C and is of type t(C)=7, i.e., C7=(C; 7).
The transposition mechanism may use an existing digital signal processing algorithm to change the frequency of an audio signal. The input of the algorithm is the audio signal of a played chord, e.g., C maj, and a number of semitones to transpose, e.g., +3. The output is the audio signal of same duration as the input audio signal, and whose content is a transposed chord of same type, here: D# maj, as D# is 3 semitones above C.
τn is written for the transposition of n semitones. For instance, τ−2 is a transposition of two semitones (i.e., one tone) down, and τ+3 is a transposition of a three semitones (i.e., a minor third) up.
Transposing a musical signal is achieved with a certain loss in audio quality. The loss increases with the difference in frequency between the original and the target signal. Therefore, each transposition τs may be associated to a cost c(τs), which mostly depends on s. For instance, c(s)=|s| is used.
It is a common practice, especially in jazz, to use chord substitutions when improvizing. Substituting one chord to another is a way to increase variety and create novelty in a performance. The substituted chords have a common harmonic quality with the original chord, for instance, they may usually have several notes in common and the bass of the original chord usually belongs to the substituted chord. A substitution rule is an abstract operation that does not affect the audio content. Instead, it can be seen as a mere rewriting rule.
For instance, rule σ1 (as shown in
The rules are all written with a left part in C, i.e., the chord on the left has pitch class C as its root. This is a handy way of writing the rules. However, the rules apply to any root, not only C. In other words, each rule represents a set of 12 rules, one for each root for the left chord. The 12 rules can easily be found by transposing the right chord as shown in
σi(c) is written to represent the chord obtained by applying rule σi to chord c. For instance σ1(A7)=E min 7.
Each chord substitution creates an unexpected effect on the listener. The effect is more or less unexpected depending on the substitution rule applied, as some substitutions are more usual than others, and as some substituted chords share more harmonic qualities with the original chord than others. Each substitution rule σi is associated to a cost c(σi) that accounts for this.
A chord transformation is preferably the composition of a single chord substitution with a single chord transposition. Given any two chords c1=(r1, t1) and c2=(r2, t2), each transformation τj∘σi from c1 to c2 has a cost c(τj)+c(σi).
A generalizing harmonization device generates accompaniments for a chord progression and from a database of pre-recorded bars, even if the database is not complete for the target chord progression. For the chords in the chord progression that have no corresponding bar in the database, the GHD uses chord transformations to generate contents to playback. The GHD uses selection algorithms to select the best transformations to apply for a given chord.
The substitution rule set is said to be complete with respect to the chord types if for any two chord types t1 and t2, there is a rule σi whose left part is of type t1 and whose right part is of type t2. The substitution rule set shown in
In the following three exemplary algorithms will be shown that provide primitives for building applications of the generalizing harmonization device. Algorithm 1 computes and returns the set consisting of the best transformations of a chord C1 to another chord C2, given a set Σ of substitutions. In the algorithm, r(Ci) denotes the root note of chord Ci and t(Ci) denotes its type.
Algorithm 2 uses Algorithm 1 and computes the minimum cost to transform a chord C1 into a chord C2.
Algorithm 3 takes two inputs: 1) a target chord c and 2) a database D={C1, . . . , Cn} of pre-recorded bars. It computes the set consisting of all pairs<C1, τ∘σ> such that Ci∈D, τ∘σ(Ci)=C, and the cost c(τ∘σ) is minimal.
D = {D1,..., Dm} is the database
The generalizing harmonization device may be used in different practical contexts. For instance, in some application contexts, a database of recorded chords is available before the improvisation starts. In other application contexts, the database may be recorded during the improvisation phase. These different contexts call for different strategies for the generation of an accompaniment by the generalizing harmonization device.
In the simplest case, a database of prerecorded chords is available and no constraint is set on transformation costs. In this case, a cost-optimal complete accompaniment may be generated with the following straightforward strategy: For each chord in the progression, play back one of the best chords available, using Algorithm 3 to determine the “best” chords. This strategy guarantees that the accompaniment minimizes the transformation cost at each bar. Algorithm 4 implements this strategy:
P = {C1,..., Cn} is the chord progression
choose randomly
τ(Di) is the actual acoustic transposition of Di
If a constraint is set on cost, e.g., the transformation cost cannot exceed a threshold, a complete accompaniment cannot necessarily be generated. Here is strategy that generates an exemplary accompaniment that is not complete, but guarantees that the transformation costs never exceed the threshold value. It consists in playing back one of the best available chords if the cost is below the cost threshold and to play nothing otherwise using Algorithm 5:
cmax is the maximum transformation cost allowed
no bar can be transformed with cost ≦ cmax
choose randomly
Generalizing harmonization devices can be applied to reflexive loop pedals. In this case, it allows a reflexive loop pedal to be used in a much more flexible and entertaining way, by reducing the feeding phase by a considerable amount of time.
In the context of a reflexive loop pedal, a musician may improvize on a chord progression. The bars during which the musician plays chords may be recorded by the reflexive loop pedal to feed a database. The bars in the database may be played back by the reflexive loop pedal when the musician plays a solo melody (or bass) to provide a harmonic support, or accompaniment, to the solo. In this context, for a given bar in the chord progression, the loop pedal only plays an accompaniment if the database contains at least one bar with the corresponding harmonic structure. To ensure that a conventional loop pedal will provide a complete accompaniment, the musician must start by feeding the database with at least one chord for every harmonic structure present in the chord progression. This may create a sense of boredom for the musician as well as for the audience.
For instance, consider John Coltrane's Giant Steps, a chord progression that is particularly complex is shown in the table below.
Giant Steps is a 16-bar progression with 9 different chords: B maj7, D7, G maj7, Bb7, Eb maj7, A min 7, F#7, F min 7, and C# min 7. Moreover, almost each bar has a unique harmonic structure in this tune. Therefore, to ensure a complete accompaniment on Giant Steps, the musician has to play chords during most of the bars of one whole execution of the chord progression. It will now be shown that a GHD according to the present disclosure may allow the feeding phase to be dramatically reduced.
In the context of a reflexive loop pedal, more complex accompaniment strategies may be implemented, as the musician has to follow the chord sequence defined by the tune from left to right. Consider the chord progression PGs shown in the above table. The musician starts improvizing on this chord progression with an empty database DGS. Here is a scenario that allows the musician to get a complete accompaniment from the GHD:
In the scenario above, only two chords, C1 and C2, were played to ensure that the GHD plays a complete accompaniment. This scenario however, does not make any restriction on the transformation costs. If the transformation cost is limited to say 4, chord A min 7 on bar 4 can no longer be obtained by τ−4∘σ25(C1) whose cost is 5. A least expensive transformation is σ23:σ23:C min 7→F7, is equivalent to A min 7→D7 after changing the roots. Therefore, the GHD can play C2 during the first half of bar 4. The corresponding cost is c(σ23)=1.
The example and scenario above raise a question: given a maximum cost cmax ad a chord progression, what is the strategy that minimizes the number of chords that have to be played by the musician in order to guarantee a complete accompaniment using only transformation with costs under cmax? This is actually a complex combinatorial (e.g. NP-hard, as known to computer scientists) problem, proven equivalent to a general set covering, which cannot be in reasonable time by any known algorithm (unless P=NP).
It is therefore proposed to follow a greedy approach to find a sub-optimal solution in real-time. Algorithm 6 computes a sequence of indices. Each index is the position of a chord in the target chord progression. It is sufficient that the musician plays chords at every specified position to ensure that the GHD will perform a complete accompaniment.
cmax is the maximum transformation cost allowed
However, some better strategies, i.e., using less chords, could be obtained by a complete search based on any backtracking algorithm. The greedy algorithm, applied to PGS with cmax=3 yields the sequence (1, 2, 3, 5). The following table shows the corresponding execution, with the transformations used and their respective cost. The musician has to play chords, say c1, c2, c3, and c4 (indicated in bold in the table) on positions 1, 2, 3 and 5 in the chord progression, i.e., B maj7, D7, G maj7, and Eb maj7. Transformations of c1, c2, c3, and c4 are then used by the GHD to play chords on the rest of the progression. The musician can therefore play a solo melody on top of the accompaniment. The feeding phase is reduced to two and a half bar.
The greedy algorithm, applied to PGS with cmax=4, yields the sequence (1, 2). The following table shows the corresponding execution, with the transformations used and their respective cost. The musician has to play chords, say c1 and c2, on the first two positions in the chord progression, i.e., B maj7 and D7. Transformations of c1 and c2 are then used by the GHD to play/generate chords on the rest of the progression. The musician can therefore play a solo melody on top of the accompaniment. The feeding phase is reduced to one bar.
In summary, the present disclosure describes a simple device and method that preferably uses audio transformations and/or musical chord substitution rules to perform rich harmonization and/or music real-time accompaniments from incomplete audio material. Real-time in this context is not limited to situations in which the chord(s) is (are) being played live by the musician, but may alternatively be played in a feeding phase for providing a few prerecorded bars. Some or all of the transformations may already be calculated in advance as well, and the actual “real-time” accompaniment of the musician is actually be performed at a later time, based on the pre-recorded chord(s) Further, situations in which the system may be able to start accompaniment after having received only a few bars, so that there would still be a delay of the few bars, shall be understood to be covered by the disclosed real-time accompaniments.
Obviously, numerous modifications and variations of the present disclosure are possible in light of the above teachings. It is therefore to be that within the scope of the appended claims, the disclosure may be practiced otherwise than as specifically described herein.
In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single element or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
In so far as embodiments of the disclosure have been described as being implemented, at least in part, by software-controlled data processing apparatus, it will be appreciated that a non-transitory machine-readable medium carrying such software, such as an optical disk, a magnetic disk, semiconductor memory or the like, is also considered to represent an embodiment of the present disclosure. Further, such software may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.
The elements of the disclosed devices, apparatus and systems may be implemented by corresponding hardware and/or software elements, for instance appropriated circuits. A circuit is a structural assemblage of electronic components including conventional circuit elements, integrated circuits including application specific integrated circuits, standard integrated circuits, application specific standard products, and field programmable gate arrays. Further a circuit includes central processing units, graphics processing units, and microprocessors which are programmed or configured according to software code. A circuit does not include pure software, although a circuit includes the above-described hardware executing software.
Any reference signs in the claims should not be construed as limiting the scope.
It follows a list of further embodiments:
Number | Date | Country | Kind |
---|---|---|---|
12195673.4 | Dec 2012 | EP | regional |
13161056.0 | Mar 2013 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2013/075695 | 12/5/2013 | WO | 00 |