The present invention generally relates to vibration transducers and, more particularly, to vibrations transducers for developing electrical signals corresponding to the vibrations of planes or strings which are particularly applicable to stringed musical instruments, particularly for amplification and control of other waveform processing such as by a music synthesizer.
Virtually all human cultures have developed devices of various types for making sounds for communication, signaling or aesthetic purposes. Among the most popular types of such devices are those which create controllable sound through the vibration of one or more strings. In general, stringed instruments comprise a structure for placing a string in tension and include a structure generally referred to as a bridge which supports the string at an intermediate point and defining one end of the vibrational length of the string to transfer the vibration of the string to a membrane or other surface, sometime part of a resonating structure, to increase the surface at which the vibration is coupled to the surrounding air. The other end of the vibrational length of the string is established by another support structure (sometimes referred to as a nut) for the string or a mechanism such as is used in some types of harp. In many string instruments, the vibrational length of the string can be modified by manually pressing an intermediate point on the string against another portion of the stringed instrument such as a so-called fingerboard or fretboard to establish a desired frequency of vibration or pitch produced by the string when it vibrates after being excited in some manner such as by plucking or strumming the string, bowing (e.g. rubbing a slightly adhesive material against the string to produce a substantially continuous sound) or by a mechanism that provides stretching, striking or plucking of the string.
Stringed instruments have been developed to have a large number of traditional shapes and sizes as well as different complements of strings which can be tuned to desired combinations of musical pitches or, in some instruments, to resonate with other notes or their harmonics (sometimes referred to as drone strings) to provide a wide variety of sound qualities. By the same token, the area of the instrument to which the string vibration is coupled and the properties thereof in regard to propagation of the string vibration therein as well as the acoustic qualities of any resonator structure largely determines the maximum volume of sound (e.g. the maximum energy that can be coupled to the surrounding air) to be propagated to the listener that can be produced by the instrument as well as the various qualities of the sound.
Since the practical size of the instrument is largely determined by the range of frequency or pitch it is to produce and the geometry of a vibrating string is limited by desirable vibrational modes, the volume of sound that can be produced by a single stringed instrument is often quite limited, leading, for example, to much greater numbers of stringed instruments than wind or percussion instruments and greater numbers of smaller, higher-pitched string instruments than larger, lower-pitched string instruments in orchestras and other musical ensembles.
However, many stringed instruments are used in performances by relatively small groups of musicians for relatively large audiences in large auditoriums or outdoor venues where electronic sound amplification is needed. In such circumstances it is important to capture only sounds from desired sources (e.g. voices or individual instruments) using individual transducers (in the case of musical instruments, sometimes referred to as pick-ups) in a highly selective manner so that suitable amplification or other audio signal processing of the signals from individual sources can be independently provided. While microphone-type transducers having relatively narrow reception angles or patterns may be suitable for voices, such transducer are less than optimal for stringed instruments since they do not adequately discriminate the sounds of the instrument from other ambient sounds. Therefore, it has been the traditional practice to directly detect the vibration of the strings of stringed instruments with electromagnetic transducers if electronic sound amplification is to be provided.
Additionally, once the vibration of the strings has been converted to an electrical signal suitable for electronic amplification, many qualities of the sound produced can be readily altered by suitable processing of the electrical signal. In general, for some musical styles, the more modification of the electrical signal representing the string vibration that is available, the less important the acoustic properties of the instrument will be. For this reason and others such as durability and the ease of mounting electromagnetic transducers thereon, so-called solid-body guitars and basses have become the standard for use in many popular music groups.
As a closely related issue, so-called music synthesizers are devices that can provide many different effects in modifying the waveform of an electrical signal to obtain a wide variety of different sounds by, for example, modifying the amplitude and phase of harmonic content of a waveform and the envelope of the waveform, either generally or over the duration of a tone. While early music synthesizers were complex special-purpose devices, at the present state of the art, their functions can be emulated by computers of relatively modest computing power and a number of software programs are commercially available to achieve such functions. Many music synthesizers intended for real time, live performance are controlled from a keyboard to provide input as to the musical notes upon which the synthesizer is to operate. However, many musician who might wish to have the wide gamut of audio effects of which music synthesizers are capable may not be as facile or well-trained for use of a keyboard as they might be in regard to other instruments and stringed instruments in particular which requires an entirely differ type of physical dexterity than is required for a musical keyboard instrument.
On the other hand, for other musical styles, retaining the subtle qualities of the sound of particular instruments may be of high importance. In this regard, electromagnetic transducers cause damping of string vibrations due to the attraction of the string to the magnetic field of the transducer as the string moves through it and significantly alters the sound qualities of stringed instruments. Moreover, electromagnetic transducers have a non-linear frequency response and require the use of steel strings on the instrument; the former being somewhat amenable to approximate electrical compensation while the latter is not and would significantly alter the quality of sound if, in fact, the instrument is of a sufficiently robust structure to tolerate the installation of steel strings. Additionally, electromagnetic transducers are susceptible to noise from ambient electrical fields including, importantly, 60 Hz noise from electrical power distribution mains.
Attempts to avoid some of the problems associated with electromagnetic vibration transducer have involved phototransistors. However, those attempts have not been particularly successful since phototransistors also have a non-linear response to light intensity due to the DC gain being a function of the collector current which has a non-linear relationship to the base current where the base current is a function of the light incident on the base junction which varies with frequency (and possibly the amplitude) of the vibration of a string. (For this reason, phototransistors are better suited for and more frequently used in switching circuits and wireless remote control arrangements.) Moreover, use of phototransistors as transducers for string vibrations are susceptible to interference from ambient light which may change rapidly and repeatedly during a performance, due either to particular lighting effects that may also be used during a performance or even by being shadowed due to the motions of the performer.
It is therefore an object of the present invention to provide an optical transducer for string vibration that exhibits a linear frequency response and can be constructed in a modular form that can be easily attached to a wide variety of string instruments in a manner that does not interfere with the playing of the instrument or significantly alter the acoustic properties of the instrument.
It is another object of the invention to provide a vibration transducer that can be applied to virtually any vibrating system and which has minimal effect on the vibrating system being observed.
It is further object of the invention to provide an interface from a transducer for string vibrations to a musical instrument digital interface (MIDI) control signal.
In order to accomplish these and other objects of the invention, a vibration transducer is provided comprising a source of substantially collimated light, a photodiode having a limited light reception angle corresponding to width of a light beam emitted from the source of substantially collimated light, a board for mounting the light source and the photodiode in a generally coaxial orientation, and an amplifier circuit having a transimpedance configuration.
In accordance with another aspect of the invention, a converter for converting a pitch frequency to a MIDI control signal is provided comprising an analog to digital converter, a fast Fourier transform processor, a peak detector, and a look-up table for correlating a frequency of a peak detected by the peak detector to an identification of a musical note.
In accordance with a further aspect of the invention, a signal converter for converting an analog waveform into a digital code is provided comprising a plurality of neural networks connected in cascade and receiving a spectrum of said waveform as an input, wherein each of said plurality of neural networks provides a binary classification of a frequency component of said spectrum of said waveform.
The foregoing and other objects, aspects and advantages will be better understood from the following detailed description of a preferred embodiment of the invention with reference to the drawings, in which:
Referring now to the drawings, and more particularly to
Referring now to
To make a photodiode useful as a vibration sensor that must be able to track a wide range of frequencies, its output current must be converted into a voltage that is large enough, in the case of a guitar string transducer, to drive a guitar amplifier. Using an operational amplifier in a transimpedance configuration can achieve this goal, and with careful selection of the amplifier, the current to voltage conversion can be achieved at high speeds and over a large bandwidth.
Since the sensor will be used for audio purposes, component selection with respect to noise, bandwidth and response speed is relatively critical. However, photodiode having sufficient response speed and bandwidth are commercially available. Generally, noise can be limited and gain and responsiveness enhanced by choice of a photodiode have a viewing angle that corresponds closely to the diameter of a string with which it is associated at a given small distance. If the viewing angle is too small, small vibrations will not modulate the light (e.g. the light from a light source will always be blocked) and if too large, there will be some level of constant illumination whether the string is vibrating or not and the changes doe to vibration will be relatively smaller and even more difficult to detect. additionally, improved resistance to interference from ambient light can be provided by spectral filtering to a given band of optical wavelengths as depicted at 27 of
In general, a photodiode should be chosen to have a response time of less than 0.30 μsec or as dictated by the upper end of the range of frequencies to be detected. Also, at the present time, a monolithic module (illustrated at 23 of
Since one of the problems with traditional magnetic pickups is noise and a non-linear frequency response, it is important to not create or allow unnecessary sources of noise and distortion to be created by circuitry components peripheral to the photodiode. By using a light source (e.g. a light=emitting diode or LED) coupled to a photo-detector (photodiode) PD in a “line of sight”, generally coaxial configuration as shown in
An excited string will usually vibrate with an amplitude comparable to its own diameter or greater as depicted at 21 and arrow 22 of
The transducers mount comprises a U shaped bracket that is affixed atop the guitar or other instrument over the strings at a desired location along the strings but preferably close to the bridge that secures one end of the strings. The location along the strings is important in that the harmonic content of the vibrational modes of the string will be greater toward the end of the string whereas the fundamental frequency component will be increased farther from the nut. The circuit board is mounted in the mount allow the photodiode and light-source to hang between each string when the modular transducer is mounted on the musical instrument as indicated in
A schematic diagram of the circuitry of the modular transducer is illustrated in
The outputs of all of the channels are then connected through a mixing or summing amplifier through a series connected resistor 45 and capacitor 44. The relative values of resistors 45 determines the relative gain of the respective channels while the capacitors 44 serve to filter low frequencies (e.g. below the frequency that can be produced by the corresponding “open” (e.g. full length) string. Resistors 46, 47 and 48 (collectively depicted at 25 of
Thus, the optical transducer converts individual string vibrations into an electrical signal by modulating a light source incident on a photodiode. Traditionally, string vibrations are converted to an electrical signal by modulating the magnetic field of a magnetic transducer. Use of a photodiode in combination with an impedance-controlling amplifier stage has several improvements over traditional magnetic transducers. As alluded to above, magnetic transducers have a non-linear frequency response, whereas the optical transducer using a combination of a photodiode and an impedance controlling amplifier stage has a rapid response as shown in
Magnetic transducers are also susceptible to hum from the electromagnetic field produced by the power mains, while the optical transducers are not. One last advantage over magnetic transducers is that, while magnetic transducers dampen the vibrations of the strings by magnetically pulling on the metal string of a guitar, optical transducers allow a string to vibrate naturally as shown in
This design also differs from current optical pickups by using photodiodes instead of phototransistors, which allow for a linear frequency response. This property allows for the analog signal to easily be converted into a digital signal, which current systems do not possess. The design can also be and preferably is manufactured as a modular system that can be mounted on virtually any guitar or other string instrument with little, if any, modification of the instrument. The optical transducer in accordance with the invention provides a far more natural sound with high fidelity to the string instrument, itself, with which it is used and provides a far cleaner signal in regard to pitch with little distortion at frequencies other than the natural vibrational modes of the string in the environment of a particular instrument.
For control of a music synthesizer, in addition to the controls of the waveform and envelope applied to the notes, a signal identifying the specific pitch or pitches of the notes to which such controls are to be applied is required. Arrangements for automatic music transcription have much the same requirements for input of pitch information. A piano-type keyboard is generally used to provide such signals since such a device provides a separate individual switch for each pitch to be produced. However, most types of musical instruments do not provide such a unique correspondence between elements of the instrument and the pitch to be produced. With string instruments, in particular, even though the vibrational length of an individual string may be determined by the user pressing a portion of a given string against a fret with which a switch could theoretically be associated, a given fret will correspond to a plurality of strings (six for a traditional guitar) that are tuned to different pitches and, moreover, different or unconventional tunings of individual strings may be used. Therefore, if a stringed instrument is to be used to control a music synthesizer, signals indicative of particular pitches must be derived from the pitch acoustically produced by a string.
Pitch detection is one of the oldest and most studied problems in musical signal processing. Many methods have been proposed ranging from time domain, frequency domain, and statistical techniques.
An important application is pitch to MIDI (Musical Instrument Digital Interface) control signal conversion. MIDI is a digital control protocol typically used to control both analog and digital music synthesizers such as electronic keyboard instruments sometimes referred to simply as keyboards. Pitch to MIDI control signal converters must take an analog signal, such as a output from an electric guitar, and perform some process in order to estimate the fundamental frequency of the note being played and relate one of the 128 notes in the MIDI protocol (a range of almost eleven octaves) to that frequency. Fundamental frequencies can very often be weaker than higher order harmonics or missing altogether in the output of magnetic transducers, as discussed above in connection with
In this regard, substantial challenges are presented by the form of the distribution of musical notes with respect to frequency. Musical notes are logarithmically distributed such that the frequency of musical notes is doubled or halved with each octave above or below a given note, respectively. Adjacent chromatic tones differ from each other in frequency by only slightly more than 5%. Thus, low pitches are only a few hertz apart while high pitches are hundreds of hertz apart. This requires frequency domain techniques to have a large block size to obtain a sufficient resolution at low frequencies, which is computationally expensive. One method has been used to linearize the pitch distribution by taking the Fast Fourier Transform of the logarithm of the transformed signal; this is known as Cepstrum analysis. Again this proves to be computationally expensive. Perhaps more importantly in the context of real time performance of music, unless substantially greater processing power is available than could be provided by a personal computer of current design, computationally intensive processing causes audibly detectable delays that are highly objectionable.
Several methods have been proposed that use artificial neural networks as some form of pitch estimator. They are typically concerned with detecting pitch in prerecorded music for music database analysis and automatic transcription processing where response time is relatively unimportant. Neural networks can also be highly processing intensive.
The invention provides a pitch to MIDI control signal conversion which accommodates polyphonic (e.g. detection of plural simultaneous pitches) conversion quite easily in the case of stringed instruments such as a guitar. Specifically, referring to
As indicated above, this first embodiment of a pitch to MIDI converter has the advantage of being small while allowing polyphonic pitch to MIDI conversion and being sufficiently small to be included in the modular optical transducer described above. However, at the present state of the art, response speed is deemed somewhat marginal in the prototypes constructed to date.
Much more rapid response speed for pitch to MIDI conversion is provided by a second embodiment of a pitch to MIDI converter which will now be described. The neural network method proposed in this second and currently preferred embodiment has several advantages over traditional methods. The first and most important advantage is that searching for a specific frequency or period is not necessary since the network is trained to recognize relationships between all data points in a single block to classify the data. Due to this the quality of the frequency spectrum is not as high as it would be for traditional methods; this allows processing time to be faster. Another advantage over tradition methods is the natural parallel structure of neural networks that can allow for efficient implementation on hardware or FPGA (Field Programmable Gate Array) devices, though because of the computational efficiency of the networks common microcontrollers or DSP devices are typically suitable.
To estimate pitch with high accuracy within a response time that is not audibly detectable, the invention provides a cascaded feed-forward back-propagation neural network whose inputs consist of the frequency spectrum of a guitar input. This method is based on the idea of intelligent neural networks that have the property of being more accurate than traditional neural networks. Intelligent neural networks are cascaded neural networks that first classify an input to a group of possible outputs and continue to reclassify the input to smaller groups until only a few possible outputs remain that the input may belong to. The cascaded neural network in accordance with the invention is unique in that it is based on a power of two decision so that the groups of possible outputs are segmented using only a power of two (32, 16, 8, etc), so that all decision including the final decision is binary. Ten-fold cross validation was used for training a network selection to insure the network model was appropriate and no over fitting has occurred.
Cascaded neural networks in accordance with a preferred embodiment of the invention are artificial neural networks arranged so that multiple networks are used to make a single decision. This model is also known as an intelligent neural network system whose goal is to simplify a complex classification by grouping the input space into several simple classifications. The overall structure of the cascaded neural network as shown in
Specifically, each neural network within the cascaded system is of the feed-forward topology shown in
a
2
=f
2
[LW
2,1
f
1(IW1,1+b1)+b2] (1)
where a2 is the output of the network, f1, f2 are activation functions, LW2,1 is the hidden layer weight matrix, IW1,1 is the input layer weight matrix, b1 and b2 are biases, and p is the input vector. Both activation functions map the output from the multiplication by the weight matrices to an output constrained typically between one and zero but many time an unconstrained function may be used such as linear or exponential. Typically a sigmoid curve is used for an activation function. However, in accordance with the invention, a saturated linear function defined in equation (2)
is used for the hidden layer activation function. This is because it is a close approximation to the typical sigmoid function and is more computationally efficient since it can be represented by a simple condition statement while a sigmoid function is typically represented by a large lookup table. A purely linear activation is used on the output.
Each network takes in an input vector represented by the frequency spectrum obtained during preprocessing shown in
The data used to train and test both artificial neural networks were acquired from notes played on a guitar and recorded at a sampling frequency of 44.1 kHz with a 24-bit depth. Thirty-two notes ranging from note E-2 (82.4 Hz, MIDI note 40) to note D 0 (493.9 Hz, MIDI note 71), were chosen to train and test the network. Each note is recorded for approximately four seconds with varying strumming intensity and speeds to give the data a wide range of transient behavior. The data was then segmented into block sizes of 1024 samples with a sliding window of 256 and organized in a large input matrix for preprocessing.
Training for all networks is performed using the Liebenberg-Marquardt back-propagation training routine in the MATLAB™ Neural Network Tool Box. This routine is designed to be able to approximate the Hessian while approaching second order training speeds. This training method was chosen due to the ease of implementation in the MATLAB™ programming environment in addition to being well-documented as the fastest training algorithm for neural networks of moderate size.
When training each network the data is split into training, validation, and testing sets. The training set is used to train the network while the validation set is used to estimate the performance of the network during training and, finally, the testing set is used to evaluate the performance of the network, once trained. Typically data is randomly split into these sub-groups and the network is trained.
Ten-fold cross validation is a method used to systematically split data between training, testing, and validation groups and insure the network model is suited to its intended task as well as assuring the network has not been over-fit to the data. In ten-fold cross validation the data is split into ten folds (e.g. given divisions of a full set of date) with training data assigned to eight folds, validation data assigned to one fold, and testing data to one fold. The network is then trained on the data set using the ten-fold split. Once training is complete the data is circularly shifted through the ten folds and a new network is trained. This continues until all data has been used for training, validation, and testing on independent networks. Once all networks have been trained their individual performance maybe compared to each other. This allows one to see if the network model chosen is suitable for the application by seeing little variation in the performance of each network. Additionally, this allows one to choose between ten trained networks for the user's final application.
Several statistical measures can be applied in order to measure and illustrate the performance of the cascaded neural network as a pitch detector. These measures are not only for showing good performance but also gives insight on choosing the best network as discussed above. Measures of sensitivity and specificity are used to evaluate each binary network. Where sensitivity illustrates the networks ability to identify positive results, this is mathematically shown in equation (3). Specificity is the measure of how well the network can classify negative results as in equation (4).
A classification having high sensitivity and specificity shows a network has been well trained and suited to classifying data.
The cascaded neural network was implemented on an Analog Devices SHARC ADSP-21469 Digital Signal
Processor (DSP). The SHARC DSP is a fourth generation floating point processor capable of clock speeds of 450 MHZ or 2700 MFLOPS [9]. All operations were done using floating-point arithmetic. Though not used the SHARC DSP also features hardware accelerated FIR and IIR filters, as well as a hardware accelerated FFT core. Other processor could, of course, be used. The particulars of the above equipment is provided to indicate the scope of processing resources which are suitable for practice of the invention.
All functions were programmed in C with minimum optimization to illustrate the high efficiency of the system. Benchmarks were performed using the Time.h header file functions. Although some overhead exists when using these functions, they provide a good indication of the speed of the system.
The neural networks chosen for evaluation were chosen based on the network that best classified the data. These networks were trained using ten-fold cross validation as described above. This provided not only proof that the neural network has correctly classified the input signal but also provided multiple independently trained networks to choose as the network to use in the final implementation in an embedded environment.
The results showed a highly accurate classifier could be realized through cascaded neural networks with a frequency spectrum input. It can be seen in
The cascaded neural networks performance on an embedded system proved promising. The benchmarks shown in
In view of the foregoing, it is seen that the invention provides a robust and modular optical transducer for sensing vibration in any vibrational system and which is particularly suitable for use in connection with string musical instruments and which provides good sensitivity to and preservation of subtle qualities of the sound produced by the instrument. The modular transducer can be easily applied to virtually any musical instrument and can be conformed to virtually any array of strings even when the strings are not coplanar. For example, the U-shaped bracket discussed above in Connection with
The cascaded neural network pitch detector was implemented on an Analog Devices fourth generation digital signal processor capable of floating point arithmetic. This device is typical of what is found in consumer music performance devices. Even with code that has not been optimized, the cascaded neural network well exceeds expectations for real-time use. Current research has shown the methods for monophonic pitch detection discussed above are also well suited for polyphonic (multiple note) pitch detection. The challenges to overcome when working with cascaded neural networks in a polyphonic setting include the exponentially larger training data set needed to represent all combinations of notes as well as developing an effective control system to correctly select the group of notes (chord) being played.
While the invention has been described in terms of a single preferred embodiment, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims.
This application claims benefit of priority of U.S. Provisional Applications 61/476,791, filed Apr. 19, 2011 and 61/623,853. filed Apr. 13, 2012, which are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
61476791 | Apr 2011 | US | |
61623853 | Apr 2012 | US |