This application is the National Stage under 35 U.S.C. 371 of International Application No. PCT/EP2010/051763, filed Feb. 12, 2010, which claims priority to French Patent Application No. 0950919, filed Feb. 13, 2009 the contents of which are incorporated herein by reference.
1. Field of the Invention
Various embodiments of the invention relate to the control of the playback of an audio file in real time.
2. Description of the Prior Art
Electronic musical synthesis devices make it possible to play one or more synthetic instruments (produced from acoustic models or from samples or sounds from a piano, a guitar, other string instruments, a saxophone or other wind instruments, etc.) by using an interface for entering notes. The notes entered are converted into signals by a synthesis device connected to the interface by a connector and a software interface using the MIDI (Musical Instrument Digital Interface) standard. An automatic programming of the instrument or instruments makes it possible to generate a series of notes corresponding to a score that can be performed by using software provided for that purpose. Among such software, the MAX/MSP programming software is one of the most widely used and makes it possible to create such a musical score interpretation application. Such an application comprises a graphic programming interface which makes it possible to select and control sequences of notes and to drive the musical synthesis DSP (Digital Signal Processor). In these devices, it is possible to combine a score driven by the interface which controls one of the instruments with a score for other instruments which are played automatically. Rather than controlling synthetic instruments by a MIDI-type interface, it may be desirable to directly control an audio recording, the control making it possible, for example, to act on the playback speed and/or volume of the file. To ensure a musical synchronization of the file which is played with the playing data of the interpreter delivered by the MIDI interface, it would be particularly useful to be able to control the running rate of the score played automatically. The existing devices do not make it possible to provide this control over the playback rate of the different types of audio files used (MP3—MPEG (Moving Picture Expert Group) 1/2 Layer 3, WAV—WAVeform audio format, WMA—Windows Media Audio, etc.) to reproduce prerecorded music on an electronic piece of equipment. There is no prior art device that allows for such real-time control in conditions of musicality that are acceptable.
In particular, PCT application no. WO98/19294 deals only with the control of the playback rate of MIDI files and not of files of signals encoded in a substantially continuous manner such as mp3 or way files.
The present application provides a response to these limitations of the prior art by using an automatic score playback control algorithm which makes it possible to provide a satisfactory musical rendition.
To this end, embodiments of the present invention disclose a control device enabling a user to control the playback rate of a prerecorded file of signals to be reproduced and the intensity of said signals, said signals being encoded in said prerecorded file in a substantially continuous manner, said device comprising a first interface module for entering control strokes, a second module for entering said signals to be reproduced, a third module for controlling the timing of said prerecorded signals and a device for reproducing the inputs of the first three modules, wherein said second module can be programmed to determine the times at which control strokes for the playback rate of the file are expected, and in that said third module is capable of computing, for a certain number of control strokes, a corrected speed factor relating to strokes preprogrammed in the second module and strokes actually entered in the first module and an intensity factor relating to the velocities of said strokes actually entered and expected, then of adjusting the playback rate of said second module to adjust said corrected speed factor on the subsequent strokes to a selected value and the intensity of the signals output from the second module according to said intensity factor relating to the velocities.
Advantageously, the first module can comprise a MIDI interface. Advantageously, the first module can comprise a motion capture submodule and a submodule for analyzing and interpreting gestures receiving as input the outputs from the motion capture submodule.
Advantageously, the motion capture submodule can perform said motion capture on at least one first and one second axes, the submodule for analyzing and interpreting gestures comprises a filtering function, a function for detecting a meaningful gesture by comparing the variation between two successive values in the sample of at least one of the signals originating from at least the first axis of the set of sensors with at least one first selected threshold value and a function for confirming the detection of a meaningful gesture, and said function for confirming the detection of a meaningful gesture can compare at least one of the signals originating from at least the second axis of the set of sensors with at least one second selected threshold value.
Advantageously, the first module can comprise an interface for capturing neural signals from the brain of the user and a submodule for interpreting said neural signals.
Advantageously, the velocity of the stroke entered can be computed on the basis of the deviation of the signal output from the second sensor.
Advantageously, the first module can also comprise a submodule capable of interpreting gestures on the part of the user, the output of which is used by the third module to control a characteristic of the audio output selected from the group consisting of vibrato and tremolo.
Advantageously, the second module can comprise a submodule for placing tags in the file of prerecorded signals to be reproduced at the times at which control strokes for the playback rate of the file are expected, said tags being generated automatically according to the rate of the prerecorded signals and being able to be shifted by a MIDI interface.
Advantageously, the value selected in the third module to adjust the playback rate of the second module can be equal to a value selected from a set of computed values, of which one of the limits is computed by application of a corrected speed factor equal to the ratio of the time interval between the next tag and the preceding tag minus the time interval between the current stroke and the preceding stroke to the time interval between the current stroke and the preceding stroke and of which the other values are computed by linear interpolation between the current value and the value corresponding to that of the limit used for the application of the corrected speed factor.
Advantageously, the value selected in the third module to adjust the playback rate of the second module can be equal to the value corresponding to that of the limit used for the application of the corrected speed factor.
Embodiments of the invention also disclose a control method enabling a user to control the playback rate of a prerecorded file of signals to be reproduced and the intensity of said signals, said signals being encoded in said prerecorded file in a substantially continuous manner, said method comprising a first interface step for entering control strokes, a second step for entering said signals to be reproduced, a third step for controlling the timing of said prerecorded signals and a step for reproducing the inputs of the first three steps, wherein said second step can be programmed to determine the times at which control strokes for the playback rate of the file are expected, and in that said third step is capable of computing, for a certain number of control strokes, a corrected speed factor relating to strokes preprogrammed in the second step and strokes actually entered in the first step and an intensity factor relating to the velocities of said strokes actually entered and expected, then of adjusting the playback rate in said second step to adjust said corrected speed factor on the subsequent strokes to a selected value and the intensity of the signals output from the second module according to said intensity factor relating to said velocities.
Another advantage of embodiments of the invention is that they make it possible to control the playback of the prerecorded audio files intuitively. New playback control algorithms can also be easily incorporated in embodiment devices. The sound power of the prerecorded audio files can also be controlled simply by embodiment devices.
A MIDI controller 110A is linked to the time control processor 30 via an interface whose hardware part is a 5-pin DIN connector. A number of MIDI controllers can be linked to the same computer by being chained together. The communication link is set up at 31 250 bauds. The coding system uses 128 tonal values (from 0 to 127), the note messages being spread between the frequencies of 8.175 Hz and 12544 Hz with a half-tone resolution.
A MotionPod comprises a triaxial accelerometer, a triaxial magnetometer, a preprocessing capability that can be used to preform the signals from the sensors, a radiofrequency transmission module for transmitting said signals to the processing module itself and a battery. This motion sensor is said to be “3A3M” (three accelerometer axes and three magnetometer axes). The accelerometers and magnetometers are inexpensive market-standard microsensors with small bulk and low consumption, for example a three-channel accelerometer from Kionix™ (KXPA4 3628) and HoneyWell™ magnetometers of HMC1041Z type (1 vertical channel) and HMC1042L type for the 2 horizontal channels. There are other suppliers: Memsic™ or Asahi Kasei™ for the magnetometers and STMT™, Freescale™, Analog Device™ for the accelerometers, to name only a few. In the MotionPod, for the 6 signal channels, there is only an analogue filtering after which, after analogue-digital conversion (12-bit), the raw signals are transmitted by a radiofrequency protocol in the Bluetooth™ band (2.4 GHz) optimized for consumption in this type of application. The data therefore arrive raw at a controller which can receive the data from a set of sensors. The data are read by the controller and made available to the software. The sampling rate can be adjusted. By default, it is set to 200 Hz. Higher values (up to 3000 Hz, even more) may nevertheless be envisaged, allowing for a greater accuracy in the detection of impacts for example. The radiofrequency protocol for MotionPod makes it possible to ensure that the datum is made available to the controller with a controlled delay, which in this case preferably does not exceed 10 ms (at 200 Hz), which is important for the music.
An accelerometer of the above type makes it possible to measure the longitudinal displacements on its three axes and, by transformation, angular displacements (except those resulting from a rotation around the direction of the earth's gravitational field) and orientations relative to a Cartesian coordinate system in three dimensions. A set of magnetometers of the above type makes it possible to measure the orientation of the sensor to which it is fixed relative to the earth's magnetic field and therefore displacements and orientations relative to the three axes of the coordinate system (except around the direction of the earth's magnetic field). The 3A3M combination supplies complementary and smoothed motion information.
The AirMouse comprises two gyro-type sensors, each with one rotation axis. The gyrometers used are Epson brand, reference XV3500. Their axes are orthogonal and deliver the angles of pitch (rotation about the axis parallel to the horizontal axis of a plane situated facing the user of the AirMouse) and of yaw (rotation about an axis parallel to the vertical axis of a plane situated facing the user of the AirMouse). The instantaneous pitch and yaw speeds measured by the two gyro axes are transmitted by radiofrequency protocol to a controller of the movement of a curser on a screen situated facing the user.
The module for analyzing and interpreting gestures 120B supplies signals that can be directly used by the timing control processor 30. For example, the signals from an axis of the accelerometer and of the magnetometer of the MotionPod are combined according to the method described in the patent application filed by the present applicants in the patent application entitled “DEVICE AND METHOD FOR INTERPRETING MUSICAL GESTURES”. The processing operations implemented in the module 120B are performed by software.
The processing operations comprise, first of all, a low-pass filtering of the outputs from the sensors of the two modalities (accelerometer and magnetometer).
This filtering of the signals output from the controller of motion sensors uses a first order recursive approach. The gain of the filter may, for example, be set to 0.3. In this case, the filter equation is given by the following formula:
Output(z(n))=0.3*Input(z(n−1))+0.7*Output(z(n−1))
In which, for each of the modalities:
z is the reading of the modality on the axis of the sensor which is used;
n is the reading of the current sample;
n−1 is the reading of the preceding sample.
The processing then comprises a low-pass filtering of the two modalities with a cut-off frequency less than that of the first filter. This lower cut-off frequency results in the choice of a coefficient for the second filter that is less than the gain of the first filter. In the case chosen in the above example in which the coefficient of the first filter is 0.3, the coefficient of the second filter may be set to 0.1. The equation for the second filter is then (with the same notations as above):
Output(z(n))=0.1*Input(z(n−1))+0.9*Output(z(n−1))
Then, the processing comprises a detection of a zero in the derivative of the signal output from the accelerometer with the measurement of the signal output from the magnetometer.
The following notations are used:
Then, the following equation can be used to compute a filtered derivative of the signal from the accelerometer in the sample n:
FDA(n)=AF1(n)−AF2(n−1)
A negative sign for the product FDA(n)*FDA(n−1) indicates a zero in the derivative of the filtered signal from the accelerometer and therefore detects a stroke.
For each of these zeros of the filtered signal from the accelerometer, the processing module checks the intensity of the deviation of the other modality at the filtered output of the magnetometer. If this value is too low, the stroke is considered not to be a primary stroke but to be a secondary or ternary stroke, and is discarded. The threshold for discarding the non-primary strokes depends on the expected amplitude of the deviation of the magnetometer. Typically, this value will be of the order of 5/1000 in the applications envisaged. This part of the processing therefore makes it possible to eliminate the meaningless strokes.
A prerecorded music file 20 in one of the standard formats (MP3, WAV, WMA, etc.) is sampled on a storage unit by a playback device. This file has another file associated with it containing timing marks or “tags” at predetermined instants; for example, the table below indicates nine tags at the instants in milliseconds which are indicated alongside the index of the tag, after the comma:
The tags can advantageously be placed at the beats of the same index in the piece which is being played. There is however no limitation on the number of tags. There are a number of possible techniques for placing tags in a piece of prerecorded music:
The module 20 for entering prerecorded signals to be reproduced can process different types of audio files, in the MP3, WAV, WMA formats. The file may also include multimedia content other than a simple sound recording. They may contain, for example, video content, with or without soundtracks, which will be marked with tags and whose playback can be controlled by the input module 10.
The timing control processor 30 handles the synchronization between the signals received from the input module 10 and the piece of prerecorded music 20, in a manner explained in the commentaries to
The audio output 40 reproduces the piece of prerecorded music originating from the module 20 with the rhythm variations introduced by the input control module 10 interpreted by the timing control processor 30. This can be done with any sound reproduction device, notably headphones, and loudspeakers.
On the first stroke entered on the MIDI keyboard 110A, identified by the motion sensor 1108 or interpreted directly as a thought from the brain 110C, the audio playback device of the module 20 starts playing the piece of prerecorded music at a given rate. This rate may, for example, be indicated by a number of small preliminary strokes. Each time the timing control processor receives a stroke signal, the current playing speed of the user is computed. This may, for example, be expressed as the speed factor SF(n) computed as the ratio of the time interval between two successive tags T, n and n+1, of the prerecorded piece to the time interval between two successive strokes H, n and n+1, on the part of the user:
SF(n)=[T(n+1)−T(n)]/[H(n+1)−H(n)]
In the case of
In the case of
Three tag positions at the instant n+2 (in the time scale of the audio file) before change of speed of the playback device are indicated in
CSF is the ratio of the time interval of the stroke n+1 at the tag n+2 related to the time interval of the stroke n+1 at the stroke n+2. Its computation formula can be as follows:
CSF={[T(n+2)−T(n)]−[H(n+1)−H(n)]}/[H(n+1)−H(n)]
It is possible to enhance the musical rendition by smoothing the profile of the tempo of the player. For this, instead of adjusting the playback speed of the playback device as indicated above, it is possible to calculate a linear variation between the target value and the starting value over a relatively short duration, for example 50 ms, and to change the playback speed through these different intermediate values. The longer the adjustment time, the smoother the transition. This provides for a better rendition, notably when numerous notes are played by the playback device between two strokes. However, the smoothing is obviously done to the detrimental of the dynamic of the musical response.
Another enhancement, applicable to the embodiment comprising one or more motion sensors, consists in measuring the stroke energy of the player or velocity to control the volume of the audio output. The way in which the velocity is measured is also disclosed in the patent application filed by the present applicants in the patent application entitled “DEVICE AND METHOD FOR INTERPRETING MUSICAL GESTURES”.
For all the primary strokes detected, the processing module computes a stroke velocity (or volume) signal by using the deviation of the filtered signal at the output of the magnetometer.
By using the same notations as above in commentary to
DELTAB(n)=BF1(n)−BF2(n)
The minimum and maximum values of DELTAB(n) are stored between two detected primary strokes. An acceptable value VEL(n) of the velocity of a primary stroke detected in a sample n is then given by the following equation:
VEL(n)=Max{DELTAB(n),DELTAB(p)}−Min{DELTAB(n),DELTA(p)}
In which p is the index of the sample in which the preceding primary stroke was detected. The velocity is therefore the travel (max-min difference) of the derivative of the signal between two detected primary strokes, characteristic of musically meaningful gestures.
It is also possible to envisage, in this embodiment comprising a number of motion sensors, controlling, by other gestures, other musical parameters such as the spatial origin of the sound (or panning), the vibrato or the tremolo. For example, a sensor in a hand will make it possible to detect the stroke whereas another sensor held in the other hand will make it possible to detect the spatial origin of the sound or the tremolo. Rotations of the hand may also be taken into account: when the palm of the hand is horizontal, a value of the spatial origin of the sound or of the tremolo is obtained; when the palm is vertical, another value of the same parameter is obtained; in both cases, the movements of the hand in space provide the detection of the strokes.
In the case where a MIDI keyboard is used, the controllers conventionally used may also be used in this embodiment of the invention to control the spatial origin of the sounds, the tremolo or the vibrato.
The invention can advantageously be implemented by processing the strokes via a MAX/MSP program.
The display shows the waveform associated with the audio piece loaded into the system. There is a conventional part for listening to the original piece. Bottom left there is a part, represented in
In the right hand column, the acceleration/slowing down coefficient SF is computed by comparison between the period between two consecutive strokes on the one hand in the original piece and on the other hand in the actual playing of the user. The formula for computing the speed factor is given above in the description. In the central column, a timeout is set in order to stop the audio playback if the user makes no further stroke for a time dependant on the current musical content. The left hand column contains the core of the control system. It relies on a timing compression/expansion algorithm. The difficulty is in transforming a “discrete” control, therefore a control occurring at consecutive instants, into an even modulation of the speed. By default, the listening suffers on the one hand from total interruptions of the sound (when the player slows down), and on the other hand from clicks and abrupt jumps when said player speeds up. These defects, which make such an approach unrealistic because of a musically unusable audio output, are resolved by the various embodiment implementations developed, which include:
The examples described above are given as an illustration of embodiments of the invention. They in no way limit the scope of the invention which is defined by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
09 50919 | Feb 2009 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2010/051763 | 2/12/2010 | WO | 00 | 11/22/2011 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2010/092140 | 8/19/2010 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5585584 | Usa | Dec 1996 | A |
5662117 | Bittman | Sep 1997 | A |
5663514 | Usa | Sep 1997 | A |
5792972 | Houston | Aug 1998 | A |
6376758 | Yamada et al. | Apr 2002 | B1 |
20070000374 | Clark et al. | Jan 2007 | A1 |
20070270667 | Coppi et al. | Nov 2007 | A1 |
Number | Date | Country |
---|---|---|
102 22 315 | Dec 2003 | DE |
102 22 355 | Dec 2003 | DE |
1 130 570 | Sep 2001 | EP |
1 850 318 | Oct 2007 | EP |
WO 9819294 | May 1998 | WO |
WO 02093577 | Nov 2002 | WO |
Entry |
---|
International Search Report and Written Opinion of the ISA dated Nov. 26, 2010 issued in counterpart International Application No. PCT/EP2010/051763. |
Number | Date | Country | |
---|---|---|---|
20120059494 A1 | Mar 2012 | US |