The present disclosure relates to a computer implemented musical instrument, and more particularly to a computer implemented percussion instrument, and still more particularly to a computer implemented percussion instrument utilising motion capture and analysis to mitigate the need for physical surfaces to drum on.
Existing percussion instruments can be divided into four classes.
1) Traditional percussion instruments where the sound is produced by the physical shocks between the drummer's hand or the implement held by the drummer, and the drumming surfaces.
2) Electronic devices consisting of a set of electronic pads configured in such a way as to mimic the layout of their non-electronic counterpart (see 1 above). The electronic pads register the drummer's hits and sounds are synthesised or played back in accordance.
3) Electronic devices arranged in a more practical form factor, such as a roll-up mat, or detached flexible pads, or a set of pads arranged on a board.
4) Software taking advantage of touch screen devices to let the user drum by tapping the screen.
Class 1), traditional drums are loud instruments and are not always usable in dense housing environments or late at night.
Classes 1) and 2) share the drawback of their size and complexity to set up. The usual modern rock or jazz drum kit necessitates a car or bigger vehicle for transport. It is cumbersome to disassemble and reassemble, tasks that commonly take tens of minutes.
For a rock or jazz band that does not own a permanent studio, this is the foremost obstacle to organising rehearsal sessions. Classes 1) and 2) are also expensive musical instruments, with starting prices in the hundreds of pounds.
The main drawback of class 3) and 4) devices is that they do not give the drummer the range of musical expression that class 1) drums do. Their layout is not compatible with the wide arm motions commonly used in drumming
Compromises in pad design for portability/flexibility also makes them less sensitive to variations in drumming accents. Touch-screen devices are even less able to capture accents.
Both class 3) and 4) devices require the addition of switch pedals to capture foot drumming These can be cumbersome and expensive, like the pedals used in class 2) devices, if they are to emulate the musical expression capacity of class 1) instruments.
Systems have been proposed for drumming without the need for surfaces to hit. The Airdrums, invented in 1986 by Palmtree Instruments, used electronic wands containing accelerometers. They did not meet commercial success, possibly because drummers felt that the weight of the wands was too cumbersome.
Several newer products aimed at the toy market, such as the Silverlit V Beat Drumsticks and the MiJam Pro Air Drummer, have appeared since. The range of expression they provide is very limited and they suffer from the same drawback as the original Airdrums.
In 2006, the Virtual Drums system was demonstrated by French students that uses two cameras to reconstruct the 3D location of drumstick tips over time. They use this information to detect collisions with virtual drumming surfaces arranged in 3D space to mimic the layout of a rock drum kit, and play back the corresponding drum sounds. This approach is very unintuitive for a drummer Embodiments of the present disclosure aim to enable a person drum without the need for physical surfaces to hit, while providing a level of musical expression on par with physical percussion instruments. Embodiments of the present disclosure observe the drumming gestures of the user and analyse them to produce the drum sounds that the user intends.
In an aspect there is provided a musical instrument comprising: an imager arranged to provide a series of two dimensional images of an operator of the musical instrument; a processor, coupled to receive the images, wherein the processor is operable to determine the position of at least two markers in the images and the processor is configured to distinguish between each of the at least two markers in a selected image based on at least one of: the position and/or size of markers in the selected image, and the position and/or size of markers in at least one preceding image of the series of images; and the processor is configured to trigger an audio output signal based on the movements and/or position of at least one of the markers. The processor may be configured so that, in the event that at least one of the markers completes a selected sequence of movements, the processor selects an audio signal for output based on the determined two dimensional position of the marker and/or imaged size of the marker.
In an aspect there is provided a musical instrument comprising: an imager arranged to provide a series of two dimensional images of an operator of the musical instrument; a processor coupled to receive the images and configured to determine the position of a marker in the images and, in the event that the marker completes a selected sequence of movements, to select an audio signal for output based on the position of the marker in the image and/or the imaged size of the marker; and the processor is configured to trigger an audio output signal based on the movements and/or position of at least one of the markers. These and other aspects and examples of the disclosure may enable the processor and imager to infer three-dimensional position information from a series of two-dimensional images, such as those collected from a single camera.
The processor may be configured to store an indication of the position and/or size of a marker in an image of the series for use in distinguishing between at least two markers of a subsequent image of the series. The processor may be configured to identify whether each marker present in an image was also present in a preceding image of the series, and to store an indication of the presence or absence of each marker in the preceding image. The processor may be configured to determine, for each marker that was present in the preceding image, whether that marker was also present in a second preceding image and to determine the change in position and/or the change in size of the marker between the two preceding images, and in which the processor is configured to distinguish between at least two markers based on at least one of said changes.
The selected sequence of movements may comprise at least one reversal in the movement of a marker. A reversal may comprise the marker moving in a first direction for at least a selected first number of images, followed by a movement in a second direction, opposite to the first direction for at least a selected second number of images. The processor may be configured to provide an audio output signal timed to coincide with the at least one reversal. The audio signal may be triggered only in the event that an estimated speed of the marker prior to the reversal exceeds a selected threshold speed, and the processor may be configured to control the volume of the audio signal based on the speed of the marker.
The imager may comprise a camera, such as a digital camera, and in some examples the imager may consist solely of only a single camera, in which case the images consist solely of a series of images collected from that single camera.
The marker may comprise a retro-reflector carried by the operator and the instrument may further comprise a lamp positioned in proximity to the imager so as to illuminate the imager by reflecting light from the retro-reflector when, in use, the retro-reflector is arranged to direct light towards the imager. The retro-reflector being arranged to direct light towards the imager enables the retro-reflector to be visible (e.g. detected and/or imaged) by the imager.
The imager may comprise a digital camera coupled to a wide angle conversion lens.
In an aspect, to configure the musical instrument, the processor may be configured to communicate an indication of an audio signal to a user, and to store an association between the audio signal and the position and/or size of a marker in response to the marker completing a selected sequence of movements. This indication of an audio signal may comprise the name and/or another visual indication of a musical instrument, e.g. the name “high hat”, or a picture of a “high hat”.
The selected sequence of movements may comprise at least one reversal in the movement of the marker, and selecting an audio signal for output may comprise selecting the audio signal based on the stored association.
In an aspect there is provided a computer implemented method of processing images to control audio signals so as to simulate a musical instrument, the method comprising: receiving a series of two dimensional images of an operator of the musical instrument; determining the position of at least two markers in the images; distinguishing between each of the at least two markers in a selected image based on at least one of: the position and/or size of markers in the selected image, and the position and/or size of markers in at least one preceding image of the series of images; and triggering an audio output signal based on the movements and/or position of at least one of the markers.
The method may comprise selecting an audio signal for output based on the determined position of the marker and/or the size of the marker in the event that at least one of the markers completes a selected sequence of movements. The method may also comprise processing images to control audio signals so as to simulate a musical instrument, the method comprising: receiving a series of two dimensional images of an operator of the musical instrument; determining the position of a marker in the images and, in the event that the marker completes a selected sequence of movements, selecting an audio signal for output based on the position of the marker in the image and/or the imaged size of the marker; and triggering an audio output signal based on the movements and/or position of at least one of the markers.
The method may comprise storing an indication of the position and/or size of a marker in an image of the series for use in distinguishing between at least two markers of a subsequent image of the series. The method may comprise identifying whether each marker present in an image was also present in a preceding image of the series, and storing an indication of the presence or absence of each marker in the preceding image.
The processor may be configured to determine, for each marker that was present in the preceding image, whether that marker was also present in a second preceding image and to determine the change in position and/or the change in size of the marker between the two preceding images, and in which the processor is configured to distinguish between at least two markers based on at least one of said changes. The audio signal may, in some examples, be triggered only in the event that an estimated speed of the marker prior to the reversal exceeds a selected threshold speed.
Embodiments of the disclosure may comprise a computer program product operable to program a processor to perform any method described herein, and/or an electronic message comprising a computer program operable to program a processor to perform such a method.
The disclosure also provides a kit for adapting a computer to provide a musical instrument, the kit comprising: a wide angle lens adapter for a digital camera and a lamp, coupled to the wide angle lens adapter so as to illuminate the wide angle lens adapter by reflecting light from a retro-reflector when, in use, the retro-reflector is directed towards the adapter. The kit may further comprise at least one retro-reflector to be carried by a user, and/or a computer program product to program a processor to perform any method described herein.
Features of the methods disclosed herein may also be embodied in apparatus configured to perform the method steps described. In addition, features of the apparatus may be provided by method steps.
There is also disclosed a musical percussion instrument based on motion capture and analysis. In this example, markers held or worn by the musician are observed by an imager to produce a series of two dimensional images over the time of the performance. The images may be received by a processor. The processor can be configured to distinguish between the different markers (e.g. left hand, right hand, right foot) by comparing the position and/or size of the un-identified markers in the current image to the position and size of identified markers in the previous images. The processor may analyse the movement of each marker over time and detect a drum hit when a marker undergoes a sharp reversal of its motion direction after having reached a sufficient speed (e.g. a speed greater than a selected threshold). The processor may determine which drum the musician intends to hit by comparing the position and size of the marker at the instant of the hit to the position and size attributes of each drum. The position and size attributes of each drum may be pre-determined and can be set by the musician before the performance according to a procedure disclosed in the application. The processor may trigger and output audio signals when drum hits are detected, e.g. virtual “drum hits” detected based on the user completing a selected series of movements. The processor may select the nature of each audio signal according to which drum it determined was hit. The volume of the audio signal may be computed by the processor as a function of the speed of the marker that triggered the drum hit in the instants before the hit.
A first aspect of the disclosure provides an apparatus for capturing part of the motion of the user's drumsticks -or hands- and feet. It comprises:
The apparatus may also comprise a lamp configured to illuminate the markers during a drumming session. The lamp may be configured to illuminate all of the markers and/or to illuminate the markers at all times during a drumming session. The use of a lamp is of particular advantage where the markers are retro-reflective.
The camera may be configured to observe the markers during the session and to continuously capture pictures; in these embodiments the camera transmits each picture it captures to the computer; and the computer program processes each picture to infer the 2D position and size of each marker within each picture; and the computer program analyses changes in marker positions and sizes over time (previous consecutive pictures) to infer whether or not to play sounds at the current time (current picture), and the nature and intensity of those sounds. Capturing pictures continuously may comprise capturing pictures at a selected frame rate. The camera may be configured to transmit each picture to the computer within a selected time period, for example “immediately”—which should be taken to include transmission performed as quickly as the camera is able, e.g. within a time period fixed by the inherent latency of the process performed by the camera.
An advantage of this apparatus over prior art is its simplicity due to the lack of need to recover 3D motion.
A second aspect of the disclosure provides a description of the gesture that enables the user to convey their drumming intent with an apparatus such as the one presented above. This description encompasses the frame of mind that the user can adopt to reproduce the gesture in an intuitive fashion.
The gesture may comprise a downward swing as in normal drumming, followed by a sudden locking of the relevant joints at the instant of the intended drum hit. For a drumstick or hand hit, the relevant joints are shoulder, elbow, wrist and finger joints. For a foot hit, the relevant joints are hip, knee, ankle and toe joints. This gesture may be referred to as the drumming gesture.
The frame of mind that a user can adopt to execute this gesture intuitively in a way that expresses their musical intent, consists in pretending to encounter an obstacle during the downward swing of the drumstick, hand or foot, thus mimicking the sudden stop of the drumstick, hand or foot that would result.
When an obstacle is actually present, such as when the user mimics a bass drum hit with their heel on the floor, thus hitting the floor with the ball of their foot, the resulting motion pattern of the corresponding marker is similar to the one that would be generated by the drumming gesture described above. Embodiments of the disclosure may therefore be able to recognise the drumming intent in that case as well.
An advantage of this gesture over an approach that consists in checking intersections with virtual drumming surfaces, is that it overcomes the drawbacks caused by the lack of visual and haptic feedback. Embodiments of the disclosure may avoid the need for the user and/or the apparatus to locate a virtual surface, and may also improve the timing of drum hits and may enable accents to be conveyed more accurately. The term “drum” may include any drum kit element, including cymbals.
A third aspect of the disclosure provides a process by which the user can calibrate the apparatus to match their drumming conditions. It comprises: a placement phase in which the computer program guides the user in placing the lamp and camera to match the space where they intend to drum; and a drum kit configuration phase in which the computer program lets the user choose the components of their drum kit and guides them in placing those components within the space where they intend to drum.
A fourth aspect of the disclosure provides a process to let a user navigate and choose from computer menus by way of an application of the recognition of the drumming gesture (second aspect) by the apparatus (first aspect). It comprises: the displaying of menu items by the computer, in either a visual or auditory form the interpretation of a drumming gesture as the selection of a menu item if the location and size of the relevant marker when the gesture is recognised match those that were attributed to the menu item.
A fifth aspect of the disclosure provides a process by which the computer program automatically generates and displays standard music notation for the drumming session at the same time as the user is drumming it.
Embodiments of the disclosure will now be described, by way of example only, with reference to the accompanying drawings, in which:
As illustrated in
The material used for the marker body may be plastic, rubber, wood or cotton. The diameter of the marker may be within the 0.8 cm to 8 cm range.
The retro-reflective material may consist of a different tape, or of a paint or coating. The markers may comprise balls, although this is merely an example and other markers of other shapes of may be used.
Alternatively or additionally, the drumstick tip markers may be luminous. In that case, the marker body is hollow and made of a translucent material such as thin plastic. A lamp such as one or several light emitting diodes is placed within the hollow of the marker body. The lamp may be powered by common consumer batteries placed on or inside the drumstick or marker.
The drumsticks may be dispensed with and the markers placed on a finger of each hand. The marker may then consist of a thimble-like object with a smooth marker shape. It may be retro-reflective or luminous. In the luminous case, the battery may be placed on the wrist by way of a wrist band if not placed within the marker.
In the example of
In the example of
The part of the foot piece resting on top of the foot may have any material and shape that ensures that the side of the foot piece facing away from the user when worn makes an angle theta 306 with the vertical between 10 degrees and 60 degrees. The dimensions of the shape may be between 2 cm and 15 cm in height 303, between 2 cm and 8 cm in depth 304, and between 2 cm and 6 cm in base width 305. The retro-reflective patch may have any concave shape of area between 1 square cm and 10 square cm. The dimensions of the elastic band may be between 0.2 cm and 6 cm in width. Its circumference may be chosen to match the range of foot circumferences observed in children and adults of both sexes. A size adjustment loop may be fitted to the elastic band.
Each foot piece may be luminous rather than retro-reflective. The part of the foot piece resting at the top of the foot may be hollow and made of a translucent material such as plastic. A lamp such as one or several light emitting diodes and standard consumer batteries powering it may be placed within this part.
For the remainder of this document the drumstick or finger markers are referred to as hand markers, and the foot piece markers as foot markers.
The computer 104 may be any device that is capable of:
The computer 104 may also be capable of powering devices and performing data input/output through a USB port.
In the example of
In some examples the vertical and horizontal fields of view of the camera is greater than 60 degrees, and in these and other examples the wide angle conversion lens may be unnecessary.
The wide angle conversion lens 402 may comprise:
The wide angle conversion lens may be any device that can extend the field of view of the chosen camera beyond 60 degrees vertically and horizontally.
In the example of
The lamp may be provided by any light source. In some examples the lamp comprises a light source operable to emit light from a volume in space smaller than 64 cubic centimetres, and/or operable to be conveniently placed so that the light emitting part is at a distance less than 2 cm from the lens of the digital camera and/or operable to provide an illumination cone wider than 60 degrees, and/or has a Lumen rating above 150 lumen;
The light emitted by the lamp may not be in the visible spectrum, for example it may be infra-red, and the camera may be configured to be sensitive to light within the lamp's spectrum. The different components of the apparatus may be configured so as to ensure that the computer program receives pictures that contain all the markers at each instant of the drumming session. The markers may be assumed to remain within a volume corresponding to playing on a modern rock drum kit. This volume is referred to in the remainder of this document as the “drumming volume”.
In the example of
In the example of
The drumming gesture when using the foot is the exact counterpart of the stick or hand drumming gesture; the joints that have to be locked at the instant when a drum sound is desired are the hip, knee, ankle and toe joints. To reproduce this gesture in an intuitive manner, the user may think of it as mimicking what would happen when a physical drum foot pedal reaches the end of its course while depressing it.
In some examples, the user will hit the floor with the ball of their foot at the end of the foot drumming gesture, thus making it very similar to using an actual foot pedal. This is not strictly necessary: a user may perform the gesture with their foot remaining in the air, as long as they stop the motion of the ball of the foot at the desired instant by locking the joints mentioned above. Examples of this are when drumming while standing on one foot, or while seated with one leg resting on the other knee.
The foot piece marker may be replaced by a marker attached to the ankle, knee or thigh with an elastic band. In that case, the foot drumming gesture consists in hitting the floor with the heel while the ball of the foot remains on the floor. This causes the motion of the ankle, knee or thigh marker to have a pattern equivalent to that of the drumming gesture described above. Such a marker location is also suitable to detect drumming gestures that originate with the thigh joint.
The computer program extracts the position and size of markers from each picture received from the camera in turn according to the following algorithm:
1. A binary threshold is applied to the picture to conserve the brighter pixels corresponding to the markers (marker pixels) and discard the darker pixels corresponding to everything else. Some pixels are labelled as dead pixels and discarded regardless of how bright they are.
2. A blob extraction algorithm is applied to group the bright pixels into connected components. The algorithm iterates through each line of the picture to extract connected segments of marker pixels. Those segments are grouped together with segments of the previous line to form connected components if they overlap. The number of pixels of each connected component is updated when new segments are added to it.
3. The four bigger connected components in terms of pixel count are chosen to correspond to the four markers. Each marker's size (radius in pixels) is computed as √(c/π) where c is the pixel count of the connected component corresponding to the marker. Each marker's 2D position is computed as the centre of mass of the pixels belonging to the connected component corresponding to it, expressed in picture coordinates (x 703, y 704). The position coordinates (and size) of each marker are stored as floating point numbers, since the centre of mass of a connected component comprising many pixels allows for sub-pixel accuracy.
The drum stick markers may be dispensed with. The computer program may implement a segmentation algorithm to isolate the pixels belonging to the drumsticks, then fit a model (e.g. a line segment) to each resulting connected component. The position of a virtual marker can then be inferred by the configuration of the model in each picture (e.g. end of the segment). The number of pixels in each stick connected component may be used as the virtual marker's size. Such an approach may become the most practical as the characteristics of digital cameras improve with technological progress.
Marker Identification Algorithm
After the markers have been extracted from the current picture, the computer program executes the following algorithm to identify the nature of each marker (i.e. left hand, right hand, left foot, right foot):
x_second_to_last, y_second_to_last and s_second_to_last are the coordinates and size of the marker in the second to last picture.
y_hand=y_min+(y_max−y_min)/4,
d
—
mi
—
mj=(x_previous—mj+dx_previous—mj−x_current—mi)2+(y_previous—mj+dy_previous—mj−y_current—mi)2+W2—s (s_previous—mj+ds_previous—mj−s_current—mi)2
d
—
mi
—
mj=(x_previous—mj−x_current—mi)2+(y_previous—mj−y_current—mi)2+W2—s(s_previous—mj−s_current—mi)2
In both formulas, the suffixes_mi and _mj are used to refer respectively to the attributes of the current picture hand marker mi and of the previous picture hand marker mj.
1. There was not a single distance d_mi_mj to compute: either there is no hand marker in the current picture, in which case the identification problem is trivial, or there were no hand markers in the previous picture. In that case, if there are two hand markers in the current picture, the one whose x coordinate 703 is highest is identified as the left hand marker and the other one as the right hand marker. If there is only one hand marker in the current picture, it is identified as the left hand marker if its x coordinate 703 is greater than a certain value x_handedness, and as the right hand marker if not.
F) Perform steps D and E above, substituting the word ‘hand’ with the word ‘foot’.
Possible Refinements of Marker Identification Algorithm
The computer program may implement the following heuristic to further enforce the correct identification of a hand marker as corresponding to the left or right hand.
1. If a marker is currently identified as a right hand marker and its x coordinate 703 becomes greater than a pre-defined value x_rightlimit, then it becomes identified as a left hand marker.
2. If a marker is currently identified as a left hand marker and its x coordinate 703 becomes greater than a pre-defined value x_leftlimit, then it becomes identified as a right hand marker.
3. If a marker swaps identity because of step 1 or 2, the computer program does not reset its position and size history, but transfers it to its new identity. Additionally, if another hand marker was present, its identity is similarly swapped.
The heuristic above may comprise a check of what drum kit element is deemed reachable by a specific hand. For example, the drumstick held in the left hand is deemed to be usable to hit all drums elements except for the ride cymbal and floor tom. If that check fails for any drum hit for a given hand marker, then the hand marker is swapped as above.
To deal with the case where two hand markers overlap in the current picture , the computer program implements the following algorithm, which is run for each picture before the marker identification algorithm:
1. If a single hand marker was found in the current picture, and if two hand markers where present in the previous and in the second to last picture, compute a distance d′ according to the following formula:
d′=(x1_previous+dx1_previous−(x2_previous+dx2_previous))2+(y1_previous+dy1_previous−(y2_previous+dy2_previous))2+W2—s(s1_previous+ds1_previous−(s2_previous+ds2_previous))2
where x1 _previous, dx1_previous etc. are defined as
dx_previous=x_previous−x_second_to_last, dy_previous=y_previous−y_second_to_last
and
ds_previous=s_previous−s_second_to_last,
where x_second_to_last, y_second_to_last and s_second_to_last are the coordinates and size of the marker in the second to last picture, with 1 indicating the first marker and 2 the second marker.
2. If d′ is lower than a predefined value d_overlap:
In some examples it is assumed that foot markers never overlap during a drumming session.
The computer program analyses the evolution of each identified marker's position and size over time to determine what sounds to play, at what time and at what volume.
There is always an upwards arming motion 801 before the swing, followed by the downward swing 802, followed by a sudden immobilisation of the marker. There cannot be a new intended drum hit without the y coordinate 704804 having increased first (arming 801). And the y coordinate 704804 has to have decreased for a pre-defined number min_n_swing of consecutive pictures (swing 802). And the y coordinate has to have exceeded a certain pre-defined minimum speed value S_min. The hit then occurs at the time of the local minimum 803 of the y coordinate 704804. That is, at the time 805 of the first picture 806 at which the y coordinate 704804 is identical or lower to what it is in the next picture. The drum sound corresponding to the hit is played as soon as the computer program detects it, that is, at the time of the next picture.
The position and the size of the marker at the time of the hit are used by the computer program to determine which drum was hit and therefore what type of drum sound to play.
For a hand marker, the process is as follows:
D=√((x—m−x—d)2+(y_m−y—d)2+W—s(s—m−s—d)2),
Through this process, embodiments of the disclosure may enable the user to express their intention to hit one drum or the other even if their pre-defined positions within the picture are identical, provided 640 that the expected marker sizes are sufficiently different. An example of this case is when the camera is facing the user: for a drum hit directly in front of the user, the marker size is small if the hit occurs near the user (i.e. far from the camera, arm is folded) and large if the hit occurs far from the user (i.e. near the camera, arm is extended). By using a small pre-defined expected marker size for a tom and a large expected marker size for a cymbal, they can both be placed in front of the user, in a line with the camera, and still allow the user to express which of them they intend to hit.
Foot Drums
In the case where there are only two foot drums, e.g. a hi-hat pedal and a bass pedal, the computer program uses the identity (left foot, right foot, see Marker Identification Algorithm), of the marker to determine which drum is hit.
In the case where one foot controls multiple drums, e.g. a hi-hat pedal and a second bass drum pedal, for the relevant foot marker (e.g. left foot), the foot drums are assigned mutually exclusive pre-defined intervals of x coordinates 703. When a drumming gesture (drum hit) occurs for a foot marker, the computer program determines which interval the x coordinate of the marker belongs to, and thus which drum was hit and what type of drum sound to play.
Determining Properties of Sound Played
The positions and the sizes of the marker during the swing part 802 of the drumming gesture are used by the computer program to refine the nature of the drum sound to play and determine how loud to play it. This lets the user express the accents of their drum hits by making wide and fast, or small and slow drumming gestures.
For a given marker, the swing part 802 of the drumming gesture is defined as the interval between the last local maximum 810 of the y coordinate 704804 of the marker and the current local minimum 803 that represents the current potential drum hit. A record is kept of the positions and sizes of the marker during its last swing phase: that record is re-initialised upon the first decrease of the y coordinate 704 of the marker after a series of increases.
Upon the first increase of the y coordinate after a series of decreases (swing 802), the record of positions and sizes of the marker for each picture of the swing phase is processed to obtain a marker speed S according to the following formula:
S=(√((x_end−x_start)2+(y_end−y_start)2+W2—s(s_end−s_start)2))/n_swing
The swing speed S may be computed in a different manner For example by summing pairwise Euclidean distances between positions of the marker in consecutive pictures, summing this with a weighted marker size difference between start and end picture, and dividing by n_swing, the number of pictures comprising the swing phase 802.
Each drum is given a pre-defined minimum speed value S_min and a pre-defined maximum speed value S_max. For a potential drum hit for a given marker (end of drumming gesture), if the computed speed S is lower than the relevant S_min, the gesture is not registered as an actual hit and no sound is played.
If S is greater than or equal to S_min and lower than or equal to S_max, a volume coefficient Vc is computed according to the following formula: Vc=(S−S_min)/(S_max-Smin) This volume coefficient, which is a value between 0 and 1, is used to weight (by multiplication) the relative volume of the drum sound played. It may also be used to determine the nature of the drum sound in the manner described below with reference to the Drum Sound Collection.
Drum Sound Collection
Each drum is represented by a collection of drum sounds that have been pre-recorded in a studio environment. One aspect of this collection is that, for a specific drum, different recordings are made corresponding to different drumming accents (how fast and hard the drum is hit). Let Na be the number of pre-recorded accents for the drum being hit. The computer program computes a series of Na intervals (I—1, I—2, . . . , I_Na) as follows:
I
—1=[0, 1/Na)I—2=[1/Na, 2/Na) . . . I—Na=[(Na−1)/Na, 1]
The computer program then computes which interval I_i the volume coefficient Vc belongs to, and plays the corresponding sound for that drum (i.e. sound number i). Another aspect of the sound collection for a specific drum is that it contains recordings corresponding to drum hits with the dominant hand and recordings corresponding to drum hits with the non-dominant hand. The computer program tracks which hand a marker corresponds to (see Marker Identification Algorithm), and plays the corresponding sound.
Another aspect of the sound collections is that different versions of each drum sound are stored corresponding to different reverberation configurations. This is achieved by applying different levels of reverb effect to each drum sound recording. This may be achieved by recording the sounds in different physical environments (e.g. house room, theatre). The computer program provides an interface for the user to choose the reverberation configuration in which they wish to play. This configuration can be chosen for all drums at once or for each drum individually.
The sound recordings are normalised in volume to allow for consistent volume gradation when applying the volume coefficients Vc of different drum hits.
For each drum d the computer program uses a variable Vd within the [0,1] interval to represent its relative loudness with respect to the other drums. This coefficient is applied (multiplication) after the drum hit specific volume coefficient Vc is applied.
The computer program provides pre-set values for the Vd of each available drum, as well as an interface to allow the user to adjust each Vd.
When a foot marker's x coordinate 703 is within an interval corresponding to a hi-hat cymbal drum element, the position of that foot marker is processed by the computer program to determine the openness of the hi-hat in the following manner:
The computer program keeps a record of two integer variables hh_min and hh_range. the computer program computes a hi-hat openness value o by examining the y coordinate 704 hh_y of the hi-hat foot marker (see above): if hh_y is lower than hh_min,o=0; if hh_y is greater than hh_min+hh_range, o=1; otherwise, o=(hh_y−hh_min)/hh_range.
The drum sound collection for a hi-hat cymbal contains recordings of the hi-hat being hit with a drumstick at different levels of openness, as well as recordings of the hi-hat being closed with the foot at different speeds.
Let Nhh be the number of pre-recorded openness levels for the hi-hat. The computer program computes a series of Nhh intervals (Ihh_l, Ihh—2, . . . , Ihh_Nhh) as follows:
When the hi-hat cymbal is hit by a hand marker, the computer program computes which interval the openness value o belongs to, and picks the corresponding type of sound for that level of openness. The properties of the sound played are further determined according to the process set out above—“Determining Properties of Sound Played”.
The computer program determines the values for hh_min and hh_range during the drum kit configuration phase. When a foot marker is operating the hi-hat as defined above for “foot drums”, the computer program updates the values for hh_min and hh_range in the following manner:
When a hi-hat hit occurs with the foot marker, hh_min is set to the y coordinate 704 of the foot marker at the instant of the hit.
If the foot marker's y coordinate 704 hh_y is lower than hh_min then hh_min is set to hh_y. If the absolute value of the difference between the x coordinate 703 of the marker at the start of an arming phase 801 or swing phase 802 and its x coordinate 703 at the end of that phase is greater than a pre-defined value hh _side _slip, then hh_min is set to the y coordinate 703 of the marker at the end of that phase. If at the end of an arming phase 810 the y coordinate 704 of the marker hky is greater than hh_min plus hh_range plus a pre-defined value hh_front_slip, hh_min is set to hh_y.
When a foot marker begins operating the hi-hat as defined above for “foot drums” during an arming phase 801, hh_min is set to the y coordinate 704 of the marker at the end of the next swing phase 803 if it is still operating the hi-hat. In the meanwhile, the hi-hat is set to open: o=0.
When a foot marker begins operating the hi-hat (as defined above for “foot drums”) during a swing phase 802, hh_min is set to the y coordinate 704 of the marker at the end of the swing phase 803 if it is still operating the hi-hat. In the meanwhile, the hi-hat is set to open: o=0.
Calibration/Configuration
The computer program provides an interface to let the user calibrate the apparatus to match their drumming conditions. This interface comprises two phases. At the beginning of the first phase (placement phase), the computer program instructs the user to place the camera and lamp roughly 50 cm to the right of the computer screen if left handed, or to the left if right handed, and to point them roughly to the location where the user intends to drum, which should be on a line such that the user is facing the computer screen. The computer program then displays in real time the pictures captured by the camera. A number of pieces of visual information are displayed overlaid on top of the current picture:
1. Pixels that are too bright, called dead pixels are displayed in semi-transparent red. Dead pixels correspond to parts of the drumming environment that are brighter than a marker would be, thus hindering the computer program's analysis of the position and size of any marker travelling within the corresponding area.
Dead pixels are computed in the following manner:
2. Dead pixels regions are annotated with text (and a sound or audio message may be played) according to the following algorithm:
3. Two semi-transparent rectangular boxes are displayed at the bottom of the picture. One is located one third from the left of the picture and annotated with the text: “feet location for right handed drumming” The other is located one third from the right of the picture and annotated with the text: “feet location for left handed drumming” The computer program instructs the user to pan and tilt the camera so as to cover the location where their feet will be when drumming with the relevant box. For example, if they are right handed, they may tilt the camera so that the box on the left is overlaid over the area in front of the feet of the chair where they intend to seat during the drumming session.
4. A button labelled “configure drums” or other text to that effect is displayed. When activated, the drum kit configuration phase (second phase of the calibration interface) begins. The drum kit configuration phase consists of the following consecutive steps:
D=√((x—m−x—d)2+(y—m−y—d)2+W—s(s—m−s—d)2),
x_d=x_placement,
y_d=y_placement,
s_d=s_placement
where (x_placement,y_placement) are the coordinates of the marker's position and s_placement its size at the end of the drumming gesture it reflected.
Once calibration is completed, (e.g. at the end of the drum kit configuration phase), the drumming session may start. The user can drum by making drumming gestures at the appropriate locations and speeds to express their musical intent.
During the drumming session, the computer program displays a menu icon at a y coordinate i_y equal to y_hand (defined above) and a pre-defined x coordinate i_x. This icon is also given an expected marker size is that is smaller than all the expected marker sizes of the drum kit being played.
When the user makes a hand drumming gesture, the menu icon is checked for a “drum” hit as if it was another drum, using (i_x, i_y, i_s) as counterparts for (d x, d_y, d_s) (defined above). To further avoid false positives, the icon is placed on the side of the non-dominant hand and the hit has to be performed with the dominant hand.
If the menu icon is hit, the computer programs enters a menu mode in which the user can control different aspects of the program by making drum gestures. Each menu comprises a set of icons (or labelled areas) representing each option, as well as an icon to go one level up in the menu arborescence, and an icon to exit the menu and return to drumming
The icons are distributed evenly across the screen to make it easy for the user to discriminate between them by making drumming gestures, in the same fashion that they selected the menu icon.
Menu Options
Menu options may include:
When selecting menu options 4, 5, 6 or 7, or any option that would necessitate the input of a continuous value, the computer program checks if a marker enters a specific rectangular area of the picture. The x or y coordinate of the marker within that box is then used to adjust the value, as if using a slider.
Continuous values may be altered by repeatedly hitting specific icons, e.g. one to increase and another to decrease. The icons may be replaced or supplemented with auditory cues. The left-right panning of the sounds representing the menu items guides the user when deciding where to execute the drumming gesture to choose a specific item.
The combination of the apparatus, drumming gesture and menu navigation can be generalised to provide a human computer interface in any suitable setting, beyond the specific application as a percussion instrument. During the drumming session, the computer program gives the user the option to switch the display to a sheet music rendering of what they have drummed so far, or have both the camera frames and the sheet music displayed at the same time. The sheet music is generated on the fly by the computer with each new hit, and accents are taken into account.
Sheet music generation can be stopped, resumed or started anew, and the results saved, printed or replayed.
The user can also edit the sheet music, in particular by click-and-dragging notes, which results in a real time update of the sheet music layout. The format used to save the sheet music can be loaded, displayed and played back. In this mode, a cursor indicates the current time location on the sheet music. If the user is playing along, their music is rendered on the fly under the current sheet music line. By removing the need for physical surfaces while not compromising musical expressiveness, the present disclosure opens the way for a new way of drumming, akin to dancing, in which the user is not constrained in the way they can move.
This can be implemented if the camera, optional lamp and marker size are such that they allow coverage of a large drumming volume. To address occlusion issues arising when aiming at allowing more freedom of movement, a full 3D motion capture apparatus comprising multiple cameras may be used as a replacement for the part of the disclosure concerned with the recovery of marker position and size.
The description above provides some examples of the disclosure, and it is contemplated that the features of these examples may be combined with the embodiments specified in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
1119447.9 | Nov 2011 | GB | national |