The present invention relates to methods, devices and computer program products for analyzing parameters of a musical performance.
Learning to play an instrument, or to sing, requires years of practice. In order to progress as fast as possible an aspiring musician will usually take lessons from a teacher who can give constructive feedback and who can tell the student what to work on. However, for the student, lessons are rare compared to practice sessions. A student will practice on several days between lessons and on those days the student does not receive any help or feedback. However, with sophisticated software running on a data processing device it is possible to create a virtual teacher that can determine some parameters of a musical performance without the need for a human assessment. A number of virtual teacher programs exist. A problem with the prior art virtual teacher programs is that the output from them is quite limited, at least during a live performance. More detailed analyses of the musical performance may be available in a summary section of the program, but lack of real-time guidance remains a problem. For instance, a virtual teaching program that analyses the performance after the performance has ended provides little useful feedback to the musician regarding what changes in body or hand position caused what changes in performance.
It is an object of the present invention to alleviate one or more of the problems identified above. Specifically, it is an object of the present invention to provide methods, equipment and computer program products that provide improvements with regard to one or more of analysis of a musical performance, in particular a real-time analysis of a live performance in a manner which enables a musician to correct aspects of playing during the live performance, whereby the invention and/or its embodiments bring about improvements to the motor skills of the musician.
An aspect of the present invention is a method comprising analyzing a performance of a musical piece wherein the musical piece comprises a sequence of musical events, wherein at least some of the musical events have a plurality of parameters, wherein the parameters are selected from a group which comprises timing, pitch and dynamics;
wherein said analyzing comprises execution of following acts on a data processing system:
In some implementations the performance of the musical piece is a live performance and the analyzing comprises real-time execution of the acts of the inventive method. In the present context, real-time analysis and presentation of a live performance means fast enough so that the musician may learn, during the performance, which parameters of their performance are off target and in which direction, so that the musician can get quick feedback regarding how changes in their technique cause changes in the parameters that indicate correctness of the performance. While a real-time analysis of a live performance is no serious challenge to a modern data-processing system, the real-time analysis has consequences as regards legibility and clarity of the information provided to the musician. For instance, the classification of the parameters of the event, and the trend of classification should be immediately apparent to the musician? Is the musician playing sharp? Or flat? At which point does the performance shift from early to late? Or from too loud to too quiet? Such problems relating to presentation can be solved by features of the invention and/or its embodiments which improve quick and clear indications of trends in the play. For instance, it is important that new markers indicating classification of recent parameters of musical events are easily distinguished from markers of earlier parameters.
In some implementations the set of classes also comprises a “missed” class for events that were omitted or events in which the parameter of the musical performance deviates from the corresponding parameter of the standard performance by more than a predetermined threshold.
For human test subjects, missed events may arise from entirely different errors than poorly played events, such as notes. Poor playing typically reflects poor technique while missed events typically reflects poor reading of music, lack of attention or both. Accordingly, it is beneficial to treat and display misses separately from poorly played events. The desire to provide immediately apparent information can be satisfied by treating events as “misses” if the error is so big that the probable reason for the error was not poor playing but inattention or poor reading of music.
In a typical but non-restrictive implementation, the parameters comprise timing, pitch and level. For timing, the “low” and “high” classes mean early and late, respectively. For pitch, “low” class means flat and “high” class means sharp. For level, “low” class means too quiet and “high” class means too loud.
In some implementations the step size towards the corner corresponding to the classification may depend on other factors besides the distance from the previous marker to the corner for the determined class. For instance, if the classification changes from “low” to “high” or vice versa, the step size may be longer than usually. In an illustrative but non-restrictive example, the step from “low” to “high” or vice versa may be implemented in such a manner that the end point of the step towards the corner of the new, changed, classification is calculated as if the starting point was on the line of correct performance. For instance, assume that the vertical y axis bisects the area to sections of “low” and “high” parameters.
The end point for the step corresponding to the change in classification is calculated from the vertical y axis. But because no marker is placed on the y axis, the visible step length is the sum of steps form the previous marker to the y axis and from the y axis to the new end position.
In some implementations the step size towards the corner of the classification may depend on a match between the compared parameter and the determined class. What this means is that the larger the error, the larger the step towards the corner of the classification, until the error is so large that the event is classified as a “miss”.
Very much useful information of a student's performance can be obtained by analyzing musical events with respect to three independent parameters or components, namely timing, pitch, and dynamics. In some implementations of the invention, timing has two component: a start time and a duration of an event. Some elements of phrasing, such as vibrato or tremolo, cannot be adequately analyzed by these three parameters as mutually independent components. The inventors have discovered, however, that analyzing timing, pitch, and dynamics as mutually independent parameters (components) provides valuable improvements over prior art techniques. Some ambitious implementations of the invention also evaluate phrasing, such as vibrato or tremolo. Like many musical terms, vibrato and tremolo are subject to a variety of different definitions. For the purposes of the present invention, vibrato is a function of pitch versus time (frequency modulation) and tremolo is function of dynamics (loudness) versus time (amplitude modulation). Evaluation of vibrato involves analyzing the center, depth and/or rate of the vibrato. This can be done, for instance, by using a smoothed version (eg a sliding average) of the particular parameters. The center frequency of the vibrato is average or median frequency around which the vibration oscillates. The modulation depth is the amount of the frequency change above and below the center frequency. The rate means the repetition rate or an inverse of the repetition period. For tremolo the parameters are analogous with those of vibrato but an average level is used instead of center frequency and the depth refers to change in level above and below the average level.
The necessity of observing variations in pitch or dynamics causes an additional unavoidable delay. But for the purposes of the present invention, real-time processing should be understood in the context of improving the musician's motoric skills. As long as implementations of the present invention provide feedback to the musician quickly enough for the musician to remember what change in playing was related to what change in performance, the processing can be considered real-time processing. For instance, if the musician notices that the markers for pitch suddenly begin to move towards the low (flat) corner. The musician may remember that two seconds ago his/her position changed, and the musician is notified of the change in performance, and a corrective measure may be taken and the same mistake can be avoided in the future. Known techniques for measuring parameters relating to timing, pitch and dynamics are listed at the end of this description. The same techniques, namely observation of pitch and/or dynamics as a function of time over the duration of a note or a sequence of notes, can be used for evaluating other aspects of phrasing, such as staccato or legato.
Regarding determination of performance scores for the various parameters of a musical event, the data processing system compares each of the analyzed parameters with corresponding parameters of the standard performance. No difference between the analyzed musical performance and the standard performance obviously results in a perfect or best possible score, which on a normalized scale may be indicated as 1 or 100%. In some implementations a tolerance margin around a parameter's ideal value (the value of the corresponding parameter in the standard performance) may be applied. Larger differences from the parameter's ideal value may decrease the performance score. In an illustrative but non-restrictive implementation, increasing differences result in lower scores only up to a point. For instance, playing a clearly wrong note or playing the right note at the time of its neighbor note or clearly playing forte instead of piano result in lowest possible scores for pitch, timing and dynamics (level), respectively. Any mistakes bigger than these may be treated as missed events instead of merely assigning a poor score to them.
In the following section, specific embodiments of the invention will be described in greater detail in connection with illustrative but non-restrictive examples. A reference is made to the following drawings:
In the attached drawings and the following description, like reference numbers denote like items and a repeated description is omitted. The first digit of a two-part reference number indicates the drawing in which the element is first introduced. For instance, all elements having reference numbers 1-xx are introduced in
An exemplary data processing system (“computer”) will be described in connection with
Step 1-2: inputting the performance of the musical piece into the computer. The musical piece must be parsed into events, such as notes and pauses. There are several alternative techniques for doing this. In some implementations, the instrument may have a MIDI interface (Musical Instrument Digital Interface). MIDI interfaces are ubiquitous in connection with keyboard instruments, such as synthesizers, but there are other instruments with MIDI interfaces too. Output from a MIDI-enabled instrument is automatically parsed into events.
In some implementations, parsing from a continuous analog or digital signal into discrete events is done externally to the computer described herein, and the computer described herein receives the musical piece in a form which is already parsed into events. In yet other implementations, the parsing is performed by the computer described herein, using any of existing techniques, some of which are described at the end of this description.
Step 1-4: accessing a standard performance of the musical piece. Step 1-6: identifying a plurality of mutually corresponding events and parameters in the musical performance and in the standard performance. Step 1-8: comparing one or more parameters of the musical performance with corresponding parameters of the standard performance. Step 1-10: classifying the parameter values (determining one of a set of classes for the parameter value), wherein the set of classes comprises: a “correct” class for parameters within a given tolerance margin of the corresponding parameter of the standard performance; a “low” class for parameters below the corresponding parameter of the standard performance; and a “high” class for parameters above the corresponding parameter of the standard performance; and, optionally, a “missed” class for events that were omitted or events in which the parameter of the musical performance deviates from the corresponding parameter of the standard performance by more than a predetermined threshold. Step 1-12: determining a position for the current marker as a step from the position of a previous marker, or from a start position if no previous marker exists, wherein the direction of the step indicates the determined class and wherein the length of the step is based on at least the distance from the previous marker to the corner for the determined class. Step 1-14: displaying a current marker using one or more visual attributes that distinguish the currently displayed marker from a background and/or the previous marker, if any, in the determined position. For instance, as regards the visual attributes that distinguish the currently displayed marker from a background and/or the previous marker, the current marker may have the strongest contrast against the background, such as the brightest marker against a dark background or vice versa, and for earlier markers the contrast may be faded.
In step 1-16 the system tests if the musical performance comprises more events to process. If yes, the flow returns to step 1-2 for inputting more of the musical performance. Otherwise the process is completed.
For timing, pitch and level, respectively, the “low” class means early, flat and quiet, while the “high” class means late, sharp and loud. Quiet or loud are not absolute levels but levels of events of the musical performance which are compared to those of the standard performance. Before comparison, the input level of the musical performance should be calibrated or normalized to a level which enables meaningful comparison with the standard performance. Reference number 2-20 denotes the origin of the area, which serves as an intuitive starting point for the step (vector) of the first event.
When a new note event is detected the current position (x, y) inside the pyramid is updated to (xnew, ynew) with a step towards the corner that represents its classification. The coordinates of the corner corresponding to the class are denoted (xc, yc). In a simple, albeit non-restrictive, implementation the step length is a fixed fraction g of the distance from the previous position to the corner of the current class. The update equations for xnew and ynew as a function of x, y, xc, and yc then become:
x
new
=g
1·(xc−x)+x (1)
and
y
new
=g
2·(yc−y)+y (2)
In the above equations 1 and 2, the factor g, which gives the step length as a fraction of the previous position to the selected corner, has been indicated as g1 and g2. This means that the factor g is not necessarily the same for x and y.
It was said earlier that the area 2-2 demarcated by corners 2-10, 2-12, 2-14 and 2-16 was normalized between −1 and +1 for both of the x and y axes. This is not necessary, however, and particularly the corner 2-16 for missed events can be moved closer to the origin. In
Reference numbers 5-10 denotes a starting marker at the origin (0, 0), while reference numbers 5-11 through 5-19 denote markers for the nine parameter values. The three first steps, denoted by markers 5-11 through 5-13 from the previous marker (or origin) are directed towards the corner 2-10 of the “correct” class, because the parameter values were classified as “correct”. For each of the three first steps, the distance from the previous marker (or origin) to the corner 2-10 of the “correct” class is reduced by the g factor, which in this example is 0.25, or one quarter. The next three steps, denoted by markers 5-14 through 5-16 from the previous marker are directed towards the corner 2-14 of the “high” class, because the parameter values were classified as “high”. For instance, a “high” parameter value may mean late, sharp or too loud, depending on whether the parameter being evaluated is timing, pitch or level. Again, the distance to the corner 2-14 of the “high” class is reduced by 0.25 for each step. Similarly, the next last steps, denoted by markers 5-17 through 5-19 from the previous marker are directed towards the corner 2-16 of the “missed” class, because the parameter values were classified as “missed”, or the event couldn't be detected at all. The remaining distance to the corner 2-16 of the “missed” class is reduced by 0.25 for each step.
Based on the description of
The architecture of the computer, generally denoted by reference numeral 8-100, comprises one or more central processing units CP1 . . . CPn, generally denoted by reference numeral 8-110. Embodiments comprising multiple processing units 8-110 are preferably provided with a load balancing unit 8-115 that balances processing load among the multiple processing units 8-110. The multiple processing units 8-110 may be implemented as separate processor components or as physical processor cores or virtual processors within a single component case. In a typical implementation the computer architecture 8-100 comprises a network interface 8-120 for communicating with various data networks, which are generally denoted by reference sign DN. The data networks DN may include local-area networks, such as an Ethernet network, and/or wide-area networks, such as the internet. The data processing system may also reside in a smart telephone, in which case reference numeral 8-125 denotes a mobile network interface, through which the smart telephone may communicate with various access networks AN.
The computer architecture 8-100 may also comprise a local user interface 8-140. Depending on implementation, the user interface 8-140 may comprise local input-output circuitry for a local user interface, such as a keyboard, mouse and display (not shown). The computer architecture also comprises memory 8-150 for storing program instructions, operating parameters and variables. Reference numeral 8-160 denotes a program suite for the server computer 8-100.
The computer architecture 8-100 also comprises circuitry for various clocks, interrupts and the like, and these are generally depicted by reference numeral 8-130. The computer architecture 8-100 further comprises a storage interface 8-145 to a storage system 8-190. When the server computer 8-100 is switched off, the storage system 8-190 may store the software that implements the processing functions, and on power-up, the software is read into semiconductor memory 8-150. The storage system 8-190 also retains operating and variables over power-off periods. The various elements 8-110 through 8-150 intercommunicate via a bus 8-105, which carries address signals, data signals and control signals, as is well known to those skilled in the art.
For inputting music to be analyzed, the computer architecture 8-100 comprises at least one sound, or music, interface 8-135. By way of example, the music interface can be a microphone-level or line-level analog interface, or it can be a Universal Serial Bus (USB) interface, to name just a few of the most common types of music interfaces.
The standard performance, in turn, can be received or obtained via any of a number of digital interfaces, such as the network interfaces 8-120, 8-125, via a USB bus or a Musical Instrument Digital Interface (MIDI) bus, to name just a few of the most common types of digital interfaces. It is also possible to receive the standard performance via the music interface 8-135, possibly in analog form, and digitalize it. This way a teacher, for instance, can play a performance that the computer later uses as the standard performance.
The inventive techniques may be implemented in the computer architecture 8-100 as follows. The program suite 8-160 comprises program code instructions for instructing the processor or set of processors 8-110 to execute the functions of the inventive method, including the acts or features described in connection with
The various optional features described in connection with different drawings can be combined freely. In other words, the optional features are not restricted to the embodiments in connection with which they are first described. For instance, the non-symmetrical placement of the “correct” and “missed” corners 2-10′, 2-16′ about the x axis can be applied to the corners 2-12, 2-14 of the “low” and “high” classes. The different g1 and g2 factors for the x and y parameters in equations (1) and (2), as shown in connection with
A sequence of markers clearly distinct from one another is believed to be optimal for indicating correctness of performance to the musician during a live performance. Earlier markers can be removed or faded away as newer markers are added. For overall analysis of a recorded performance techniques other than distinct markers may be used. For instance, a “heat map” can be used to indicate which portions of the display area were most occupied during the performance. In the present context, a heat map refers to display techniques wherein the probability of a portion of a display area to be occupied is mapped to a range of visual indicators, such as varying shades of grey, different colors, hatching or textures. Alternatively, other multi-dimensional display techniques, such as 3D bar codes or surface diagrams can be used. In yet other implementations, a heat map can be overlaid by a sequence of markers, wherein the heat map shows long-term analysis and the sequence of markers indicates accuracy of recent events.
In addition off-line analysis of a recorded performance, the invention and/or its embodiments are applicable to a real-time analysis of a live performance, and thus have the capability to bring about improvements to the motor skills of the musician. Some aspects of the invention and/or its embodiments are particularly advantageous for real-time analysis of a live performance. For instance, it is important that changes in classification of events, such as a change from early to late or from sharp to flat, are clearly presented to the musician. This way the musician, who may have altered the position of their body or hands, receives immediate feedback on the effects that the altered position has on accuracy of performance. These design goals are supported by various features, such as making the step size between markers dependent on the distance to the corner of the classification. This ensures that markers for a sequence of events rarely overlap each other, as would be the case if constant-size steps were used, and the classification changes direction (eg low to high or vice versa). The desire to make classification changes more visible is further supported by taking a step bigger than normal when the classification changes.
In order to further improve the real-time feedback given to the musician, the computer may be programmed to change some of the factors that influence placement of the markers as the musician's performance improves. For instance the classification of event parameters into correct, low, high or missed (cf.
The tightened classification or a difficulty level changed otherwise can be indicated to the musician(s) by displaying an explicit difficulty level as a symbol, such as a number, or by changing the color, size and/or shape, or some other visual attributes, of the markers that indicate classification. For instance, when the musician has attained a level at which several consecutive events are classified as correct, the markers tend to cluster near the corner for the correct class, and it may be difficult or impossible to distinguish a newer marker from an underlying older marker. This residual problem can be solved by dynamically altering one or more visual attributes of the markers. For instance, the color of the markers may be changed according to a scheme which ensures that a newer marker is visually distinct from a co-located older marker.
While the attached drawings show only one display area 2-2 at a time, a more detailed indication to the musician can be given by showing multiple display areas 2-2 simultaneously, one for each analyzed parameter, such as start time, duration, pitch and level.
In a variation of this idea, multiple parameters can be displayed on the same display area 2-2 by using markers of different shape, size and/or color. In some implementations, the computer can indicate which of the parameters is deemed to deviate the most from the standard performance. The parameter(s) needing the most attention from the part of the musician may be highlighted on the display, or it/they can be selected as the only one(s) being displayed.
In some implementations the multiple display areas belong to multiple musicians. There are several ways to utilize display areas belonging to different musicians. For instance, the different musicians can simply compare their own displays with those of the others and see who's performance is the most (or least) accurate.
Onsets indicate when a new note is a played. For the purpose of describing how onsets can be detected it is convenient to classify an onset as either hard or soft. A hard onset is characterized by a quick rise in energy, and it occurs for example when a guitarist picks a string or a drummer hits a practice pad. A soft onset is characterized by a change in the frequency content without an associated large increase in level, such as when a guitarist performs a slide or a pitch-bend (using the left hand to change the pitch without picking the strings with the right hand). Thus, detecting an onset usually requires locating a sudden increase in energy or a sudden change in the frequency spectrum. It is common to determine an onset through three stages: 1) time-frequency processing of the input signal followed by 2) feature extraction and 3) onset detection based on the value of a decision function calculated from the extracted features [Bello, 2005]. As an example of how to detect a hard onset from the energy or level increase only, the input's energy envelope can be calculated by summing up the square of the samples inside short frames, and when the energy increases over three consecutive frames an onset is detected. For soft onsets, a detection method must consider the frequency distribution so it is not enough to use only the total energy as a feature. One approach for soft onsets is to calculate the spectral flux (the sum of absolute differences between two spectrums) S between adjacent frames, and when S exceeds a certain threshold an onset is detected [Bello, 2005]. Onset detection can also combine energy features with frequency features in order to make the method more robust [Klapuri, 1999]. For example, the energy envelope can be calculated for several frequency bands and the final onset detection is based on a combination of these frequency bands. Even more sophisticated methods based on machine learning exist. Such methods are typically based on classifiers such as SVM (Support Vector Machine), kNN (k-Nearest Neighbour), and Naive Bayes [Chuan, 2006]. Other methods can be found from [Bello, 2003]. Pitch estimation attempts to extract the important frequencies that make up a signal. In practice it is necessary to distinguish between single pitch- and multi-pitch signals. Virtually all sounds are composed of a multitude of frequencies, but describing a signal as single pitch effectively suggests that it is produced by a single sound source such as one vibrating guitar string or one human voice singing. Multi-pitch signals are more complex, and they can be produced by any number of simultaneous sound sources such as an orchestra or one guitarist letting several strings ring at the same time. In the case of single-pitch sounds the task is to detect the fundamental frequency which is the lowest rate at which the signal repeats. Most methods for estimating the period of a signal are based on analyzing the autocorrelation function. See, for example, the YIN algorithm [Cheveigne, 2002]. Other pitch estimation techniques can be found in [Rabiner, 1976], [Charpentier, 1986], [Klapuri, 2000], [Tolonen, 2000].
Dynamics reflect whether a musician's playing is loud or quiet. It is very important as an indicator of certain types of performance problems such as a drummer consistently playing louder with the right hand than the left or a guitarist playing louder when picking with a downstroke than with an upstroke. The level of an audio signal is usually calculated by a short-term average of its energy [Boley, 2010]. When the purpose is to judge whether a sequence of events are at the same level it does not matter what that level is. However, when the musician has to play a sequence of events that has changing dynamics, such as a crescendo, it is necessary to set a reference level in order to classify events as ‘too loud’ or ‘too quiet’. However, it is not obvious how to normalize the level when recording a musician since the recorded level depends on the distance to the microphone as well as on the particular musical instrument and the musician's playing style. Consequently, a reference level must be calculated in the recording situation. One method is to set the reference level L (in dB) to the loudest recorded event, and then consider the music dynamics ‘forte’ to be L. Other music dynamics can then be set relative to L. For example, ‘piano’ can be set to L-6 dB.
In the MIDI specifications 1.0 [MIDI, 1995] it is specified that a Channel Message contains information about the timing of onsets (Note-On and Note-Off, unit MIDI-ticks), pitch (Note Numbers and Pitch Bend, integer between 0 and 127), and dynamics (Velocity and Aftertouch, integer between 0 and 127). Consequently, everything that is needed for the classification of a note event is readily available in the Channel Messages transmitted via the MIDI protocol.
Those skilled in the art will realize that the inventive principle may be modified in various ways without departing from the spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
20135575 | May 2013 | FI | national |