Persons who wish to improve their playing of a musical instrument have traditionally relied on personal instruction and solitary practice sessions. While personal instruction can be very helpful, it is typically rather expensive and is dependent on the availability of a personal instructor to fit the schedule of the music student. Solitary practice sessions are convenient, but lack useful immediate feedback on the performance of the student performer.
A variety of devices can help performers improve their musical instrument playback performance. For example, electronic metronomes help a performer maintain a steady count. Systems have been developed for computer display of a music score (sheet music), making a wide variety of music conveniently available for practice by the performer. See, for example, U.S. Patent Application 2004/0040433 to M. Errico. Other systems assist in optical recognition of music scores for storage as digital data and subsequent computer display. See, for example, U.S. Pat. No. 5,825,905 to T. Kikuchi.
It would be helpful if a performer could utilize a synthesized performance (audio rendition) of a music score and could listen at any time to difficult passages in a music score (or in fact, listen to the entire score) played correctly, with the correct pitches (in tune) and the correct rhythms. The performer could then practice by duplicating or reproducing the correct ways of playing. It also would be helpful if a performer could view a music score on a dynamic display that is synchronized with the synthesized audio rendition, and practice playing a musical instrument or singing according to the displayed musical score. It would also be helpful if a performer could record his or her performance and then play back the performance at any time, for assessment of the performance and for comparison with the correct (synthesized) rendition. In this way, anyone wishing to practice playing a musical instrument (or voice) could be prompted with a correct musical synthesized rendition and could then evaluate his or her own performance of the music score. In addition, it would be helpful if a performer could play (or sing) along with a correct synthesized rendition.
Thus, there is a need for more convenient music score capture, performance recording, and synthesized performance and analysis techniques. The present invention satisfies this need.
The present invention provides capture and subsequent interpretation of a passage of music score (or an entire piece of music or a song) for solo instrument, multiple instruments, voice or multiple voices, or any combination thereof, processing of the data so as to produce a synthesized audio presentation and synchronized concomitant display of a visual presentation of the music score corresponding to the audio presentation, and supports recording of a performer's musical performance of that music score for later playback of the performer's musical performance. The means for providing these features can comprise application software on a host digital computer. Alternatively, these features can be provided by a handheld device that is self-contained. Both embodiments, the host computer and handheld device, include means for receiving a digital representation of the music score, a display that shows a visual presentation of the music score, and a facility for a synchronized synthesized audio rendition of the score. The digital representation of the music score can be received from a digital image capture device or over a network connection from a data source. The embodiments also can provide for recording of a user performance and playback of the user's performance. In accordance with the invention, music score data can be received from an external source such that the computing device can produce an audio presentation of the music score data and can produce a synchronized visual presentation of music notes corresponding to the audio presentation.
Other embodiments can provide additional flexibility and more convenient operation. For example, the handheld device can be adapted to receive external memory cards that can store entire musical works, volumes of works, method books, and the like in digital data format. Internet and/or telecom interfaces can allow for downloads in digital data format. For example, the device can download music scores in digital data format. Such downloads can be stored in external memory cards or similar media. Image capture of input music score data can be supported through digital photography or optical scanning of music scores. The application software implementation can include performance evaluation features and playback assistance features.
A “music minus one” feature can be provided to enable the user to digitally capture a music score that is for more than one instrument or for more than one vocal line (or any combination thereof); opt to have the synthesized audio presentation leave out a specified instrumental or vocal line of the music score (“minus one”) so that the user may play and/or sing along with the synthesized audio presentation. The synchronized visual presentation of the music score can include any or all of the instrumental and/or vocal parts of the original data. The user may opt to leave out more than one part of the synthesized audio presentation, such as additional instruments or vocal lines, resulting in “music minus two” or “music minus three”, and so forth, depending on the number of elements left out.
Other features and advantages of the present invention should be apparent from the following description of the preferred embodiments, which illustrate, by way of example, the principles of the invention.
In one embodiment, the features of the invention are implemented in software, comprising an application that can be installed on a digital computer. The software implementation preferably provides input and output interfaces for the performer. That is, the host computer in which the software is installed typically includes a display for producing a visual presentation of a music score that the performer can read, to sing along or play the performer's musical instrument. The computer also typically includes an input interface, such as a microphone, for recording the performer's session, and includes an output interface, such as speakers, to enable the performer to listen to the recorded performance. The computer implementation can include image capture, wherein a music score comprising notes on a staff can be digitized via an optical input means and then entered into the computer. The digitized music score can be interpreted via OCR techniques, with the resulting interpreted data being processed so as to produce a synthesized audio rendition of the music score, including when appropriate a synthesized vocal rendition matching words with appropriate pitch, such that the audio rendition is synchronized with a visual presentation of the score. In the additional detailed descriptions provided below, the computer software implementation is referred to as a “Level X” implementation or is referred to as the “eMuse X” product (the name “eMuse” referring to a product embodiment from Princeton Music Labs LLC of Princeton, N.J., USA, the assignee of all rights in the invention).
In another embodiment, the features of the invention are embodied in a handheld device that can include a display, an input interface, audio and visual output interfaces, and OCR image interpretation interfaces. The handheld device implementation includes a variety of convenient user control knobs and mechanisms for convenient navigation of the device functions. The display supports a visual presentation of menu options for selection of functions by the user.
As described further below, a computing device interprets and processes music score data by receiving the music score data from an external source and subsequently producing a synthesized audio rendition of the music score data and a synchronized visual presentation of music score.
The external source can consist of a network data source that provides the music score data to the computing device over a network connection. The network connection can consist of communication between the computing device and the network over a wireless connection.
The music score data can be read from a recorded medium by accepting the recorded medium into a reader of the computing device that then obtains the music score data from the recorded medium. The recorded medium contains sufficient data for synthesized audio rendition in accordance with a musical instrument digital interface (MIDI) specification for synthesized music production. That is, the computing device can receive data that specifies a music score and can generate or synthesize corresponding musical tones in a selected tempo, timbre, clef, key signature, time signature, and the like. The recorded medium can comprise a flash memory device.
The computing device can be provided with ability for recording a user performance of a music score and providing playback of the recorded user performance. The user performance playback can occur independently of the synthesized music score rendition, or can occur simultaneously. In addition, the user performance playback can be provided along with a visual representation of the musical notes corresponding to the recorded user performance. In this way, a “music dictation” feature is provided.
In one alternative, the music score data used by the device to generate both the synthesized audio rendition and the synchronized visual presentation of the music score can be obtained by the device optically capturing a digital image of a music score, then interpreting and processing the digital information to produce a collection of data representing appropriate music notes, thus generating data that corresponds to the musical score.
In addition, musical contextual information can be provided that determines characteristics of the synthesized audio rendition of the music score data, all of which may be adjusted by the user. Such musical contextual information can include multiple key signatures, time signatures timbre, tempo and expressive terms such as legato, crescendo, ritard, etc.
In another alternative, producing an audio playback of the music score data and a visual presentation of musical notes is effected through communication with a network data source. If desired, the network data source provides the music score data to the computing device. In yet another alternative, the network data source can provide to the computing device the musical contextual information that determines musical characteristics of the synthesized audio rendition of the music score data. Also, the network data source can provide the musical context information over a wireless connection.
In one alternative, producing a synthesized audio rendition of the music score data and a synchronized visual presentation of music score is effected by inserting a recorded medium into a reader of the computing device. If desired, the computing device obtains the music score data from the recorded medium, and the recorded medium can also provide the musical contextual information to the computing device for determining musical characteristics of the synthesized audio rendition of the music score data.
One optional feature is to provide recording of the user's instrumental and/or vocal performance of the music score. Another alternative is to produce a synthesized audio rendition in accordance with a musical instrument digital interface (MIDI) specification. In addition, producing the visual presentation can consist of displaying the music score synchronized with the corresponding synthesized audio rendition. Another option is to provide simultaneous synchronized playback, playback of both the visual presentation and audio rendition of the music score data and both the audio component of the recorded user performance and a synchronized corresponding visual display of the music score generated by the user performance.
In accordance with the invention, a computing device can optically digitally capture a music score and interpret the digital image, generating music score data for the computing device that corresponds to the digitally captured music score, and produce a synthesized audio rendition of the music score data and a synchronized visual presentation of music score. The computing device can receive musical contextual information that is used by the computing device to determine musical characteristics of the synthesized audio rendition of the music score data. Similarly to the alternative embodiment described above, the musical contextual information can include multiple key signatures, time signatures timbre, tempo and expressive terms such as legato, crescendo, ritard, etc. that can be selected by the user to determine the musical characteristics of the synthesized audio rendition of the music score data. As an option, the computing device identifies the musical contextual information from the optically digitally captured music score, and optionally can obtain the musical contextual information from a network data source. If desired, the network data source provides the musical contextual information over a wireless connection with the computing device.
The computing device can be provided with its own loudspeakers for audio playback of synthesized renditions and/or performances recorded by the user. Additionally, the device can include an output jack for connection to headphones or external loudspeakers or the like, and can also be provided with wireless transmission capability that allows the device to transmit an audio performance to a wireless sound playback system (such as a home stereo system that has been enabled with wireless components). The device has sufficient computing memory to enable it to store musical passages of predetermined length.
The additional detailed descriptions below refer to various implementations of features in the handheld device implementation and are referred to as “Level 1” and “Level 2” or “eMuse 1” and “eMuse 2”, respectively.
The following discussion describes music playback software that can be installed on a range of digital computing devices, and also describes embodiments of a handheld sheet music reading device, herein collectively referred to as the eMuse product. References to “Company” are references to an entity that provides data or other support for proper operation of the eMuse product. References to “PML” are references to “Company”, Princeton Music Labs, LLC (the assignee of all rights in the invention), or other suitable support entity.
Attached as
The
A record/playback feature of the device 100 allows the user to immediately evaluate a recorded performance with reference to the music score. That is, the device 100 can record a user's performance of the musical piece and play back the user's performance, along with (or simultaneous with) playback of the received musical piece. The user performance playback can be presented with a corresponding visual presentation, providing the “music dictation” feature described further in this document. Both a metronome and a musical tone tuner capability are also incorporated into the device, and the device can be adjusted for “music minus one.” In a multi-staff or multi-part piece of music, the “music minus one” feature allows the user to determine which part(s) of the piece will be played back by the MIDI interface. This allows the user to play/sing a specific part along with the device.
Control buttons are also provided for controlling, as illustrated in
The features of the product can be summarized as follows:
Functional Description:
“Reading” the Musical Score
A digital camera system 114 captures an image of a passage (a single note, several measures, or even an entire page) within a musical score. The digital camera can be built into the device 100 and can comprise a lens and image transducer combination that will be familiar to those skilled in the art. The LCD display 102 allows the user to determine exactly which measures are captured. The device can read a single stave musical line, duets, trios, quartets, or even a full conductor's score. The device 100 offers multiple simultaneous timbres.
Processing the Music and Downloading Contextual Information
The OCR module receives the “photograph” of the musical excerpt, comprising digitzed image data. Important additional musical contextual information, such as key signature and meter, is also sent to the OCR module, via a music score digital image or via a “cheat sheet” (downloaded from the PML website, then transmitted wirelessly or via the USB port to the device—see below) that lists all available key signatures and time signatures. The “cheat sheet” also includes a section from which the user can select the desired timbre(s), or the user can manually specify (input) the desired timbre(s).
MIDI Synthesizer
The OCR module sends the sound information to the MIDI module that produces synthesized sound. This offers adjustable timbre; the user specifies the type of instrument (piano, violin, flute, etc.) for the particular musical passage or piece. The module also offers adjustable tempo so that the user can hear the passage slower (or faster) than the metronomic (if any) indicated in the score without any alteration of pitch. The device plays back through its own small loudspeaker, and also has a headphone jack 134 and wireless capability for headphones and/or external speakers.
Visual Display
The LCD monitor display 102 helps the user make sure that the measures being captured (photographed) are the measures that are intended to be heard. The LCD monitor display, complete with a cursor 136, displays the music score 104 as the passage is played back, either from a passage that was photographed by the user or from a music-card with stored data. The cursor indicates the exact musical position in the score of the current note(s) being played as the passage is played in real time, regardless of the specified tempo. Rather than a traditional type of moving cursor, the display 102 can instead indicate the note being played by highlighting the note (e.g., making it brighter) or by giving it a different display color from the other notes as it is played. Another option is for the LCD display to show the names of the notes (both in English and in solfege) 138, particularly for a single-line passage. If the passage is comprised of multiple simultaneous musical lines, the user can specify the line for which the names of notes are displayed.
The display 102 also shows an indicator of the music score passage selected for play. The indicator is referred to as the passage marker 139. In
Recording Sensor
The microphone 112 is provided so that the user can record him/herself playing (and/or singing) the musical passage in question and immediately play back the recording to compare the user's performance with that of the device 100 (that is, of a previously recorded or synthesized rendition). This feature is helpful for students to make adjustments in notes, tuning, rhythm, and dynamics. As noted above, a user performance can be recorded via the microphone to provide the “music dictation” feature.
Wireless
The device 100 is preferably provided in wireless versions to permit wireless communications with networks and other wireless-enabled device, and to permit downloads of encoded music files with contextual information. The features described herein can be provided by eMuse software installed to a wireless platform, such as a PDA or smartphone, for portable music interaction. In addition, wireless eMuse devices can use computing and memory (and playback audio) of the home PC and/or stereo system.
Power
Power is via rechargeable batteries; DC input ( 9/12 volts) is also available through an external connection 140.
“Music-Card” Feature
A card (information storage device) digitally encoded with an entire musical piece (or a simple method book) can be inserted into the OCR module at the card slot 106. This allows the user quick reference (auditory and visual—see Visual Display above) to specific measures.
“Music-Cards”
These will be available for retail purchase and can comprise conventional media, such as Secure Digital (SD) cards, or CompactFlash cards, or XD cards, or “Memory Stick” devices such as available from Sony Corporation. In addition, PML will offer a substantial library of music (computer file representations of scores, both visual and aural), ranging from method books to more complex standard Classical repertory to jazz and pop “hits”, available for password encrypted downloading for eMuse users. These files will be downloadable to the user's home PC, with the user then either burning a “music-card” or transmitting the file to the wireless eMuse.
Network Communications
The eMuse devices can communicate over telecom networks to download encoded music files from music retailers (such as Tower, HMV, etc.) and ring tone providers.
“Music Dictation”
In another embodiment, a user's performance can be recorded by the device and the user's performance can be subjected to a music note interpretation processing to generate data from which is produced a display of the music notes corresponding to the user's recorded performance. In this way, the device can take “musical dictation” and can convert a live audio performance by the user into a visual display of the music score corresponding to the performance. Thus, the music interpretation features of the device can process both music score data received by optical or electronic network communication, and can process music score data produced by a user's live performance, captured by a microphone.
Product Versions—eMuse1, eMuse2, eMuseX
The embodiments illustrated herein include the following three products:
The software described herein can be used in a variety of platforms. For example, aspects of eMuse could be embedded in a high-end cell phone in which the cell-phone camera photographs a specific passage in a musical score. The captured image is then compressed and sent to a remote server, which performs OCR operations on the image data to interpret the image into corresponding musical note information. The server then sends back both a midi file and a graphic file, enabling this version of eMuse to play the music that was photographed and display the notes on the LCD as they are played.
Thus, eMuse software can be installed in a user's platform of choice—such as a camera-equipped telephone or similar PDA, in addition to the devices preloaded with eMuse software.
The software that interprets the captured music score image into a corresponding set of notes, utilizes machine learning techniques and will be trained to achieve an accuracy rate approaching 100%, while interpreting substantially in real time. The conventionally available musical notation OCR software for converting musical note images into corresponding notes are generally intended for offline editing, at a time reomved from the actual image capture, and cannot achieve the near-100% accuracies. Currently available music score conversion software can be procured from companies such as Sibelius®, Smart-Score®, and SharpEye®.
In all embodiments described herein, eMuse encoded contextual files can be received over a telecommunications link, either wired or wireless, such as WiFi, Bluetooth® and/or other telecom connections.
eMuse1
The note data interpretation process 208 receives the digital data corresponding to the music score and processes it to produce a set of musical notes and concomitant information sufficient to specify the musical score and enable its reproduction by suitable hardware. The process 208 comprises a processor trained with machine learning techniques to recognize the music score digital data 206, 210 and produce appropriate transformed data. The process 208 can be trained, for example, using neural network software engineering techniques to increase the accuracy of the interpretation process up to substantially 100% accuracy. In accordance with the present invention, the incoming music score data must be produced for audio and visual presentation to the user in real time, and therefore interpretation of the incoming music score data must be in real time and must approach 100% accuracy of interpretation (transformation). The process 208 utilizes optical character recognition (OCR) techniques, but is adapted for music note recognition and interpretation of digital data (electronic or optical scan derived) to an appropriate representation.
The interpretation process output 212 comprises a visual presentation of the music score, which is provided to a display screen 214 of the device, and also a synthesized audio rendition of the music score, which is provided to appropriate device systems and hardware 216 for audio presentation through loudspeakers of the device, or the like.
eMuse2
eMuseX
In the next operation 404, the music score digital representation is provided to the note data interpretation process of the host computer, either by operation of the image capture combination or by operation of the electronic data receiving combination. The interpreted musical score data is provided to the host computer for processing and presentation 406, such that a display presentation 408 and an audio reproduction presentation 410 are generated by systems of the host computer for presentation to the user. The audio presentation 410 and display presentation 408 will generally correspond to the respective audio presentation 216 and display presentation 214 of the dedicated devices (
Additional Functionality
Buttons and Dials, Ports and Jacks
The following design features are provided (see
If desired, one or more of these buttons and dials can be combined in a simple +/− toggle.
Construction
Commands and data are stored in memory 706, which can include program memory or ROM 708 and data memory or RAM 710. The memory 706 can be a mixture of volatile and non-volatile memory. The CPU executes commands and program instructions stored in program memory 708 to provide the features described herein. Operational data, such as music score data and the like, are stored in the data memory 710. Other data can be stored or received from storage devices 715 such as fixed storage devices (such as hard disk drives), storage drives for removable media (magnetic floppy disks, optical discs), and removable data cards (such as flash cards and similar media). Thus, the eMuse processing 712, including the note interpretation processing engine 714, shown as a separate component in
The device 700 also includes a keyboard 718, for receiving user inputs and commands, and includes a display 720, for presentation of data to the user. The display can comprise a display screen of a handheld device constructed in accordance with the invention, or can comprise a display of a host computer in which an application software embodiment of the invention is installed. The device also includes audio output 722, such as loudspeakers that can produce the audio rendition of a music score. The audio output facility 722 can also include headphone connections for private listening or other line out connections.
The device 700 also includes an image capture facility 724, such as an integrated digital camera system having a lens and shutter control button. The image capture facility can be an externally connected system, such as where a digital camera might be connected to a host computer via a network connection such as a USB port or wireless Bluetooth link. The device also includes an audio capture facility 726, such as a microphone connected to the device.
Examples of Usage
Following are scenarios of device usage, which illustrate how embodiments of the invention can be put to use.
Summary of Operation for the Handheld Device with Music Card Reader
The user inserts a music card into the product's music card slot, selects a music piece, selects timbres, selects a tempo, selects a location in the piece (the “passage marker”), and pushes the play button. The stop button stops the music. The play button starts play of the passage, at the beginning or taking it up again where it left off in a previous session (the “play location marker”), unless the user presses the back arrow button, in which case the play location marker moves back to the passage marker.
With the exceptions of volume and either tempo or the location of the passage marker, all selections are made by a single wheel/button or knob/button combination (in
Once operation is proceeding after initiation, the menu wheel 120 changes roles and always controls either tempo or the location of the passage marker. If it controls tempo, then volume and the passage marker have separate wheels; if it controls the passage marker, then volume and tempo have separate wheels. There are two reasons for having three separate wheels:
Details
On/Off
The unit powers up when the user depresses the on/off button. It may also power up when a flash drive is inserted. It shuts off when the flash drive is removed or after a selected period of inactivity, or when the user depresses the on/off button again.
Menu Sequence
Upon application of power, the unit enters the menu sequence. This is a series of choices that the user must make before play can begin. It appears as a sequence of lists (usually just two). The user can scroll through each list with the menu wheel. The next button registers the highlighted choice and moves to the next step.
The sequence of lists is dynamic, but may contain the following:
As the lists are presented, the screen appears as follows:
These instructions may take the form of labels for the wheel and button.
After the last selection has been made, the unit advances to play mode.
Play
When the unit is in play mode, the screen displays music notation (with one or more location markers) in the middle, and the current metronome marking, probably near its adjustment wheel.
Music Notation
Music notation is presented on the display screen in accordance with the physical size of the screen; generally sufficient size is available to show a single staff or system of staves.
Positioning
The product keeps track of two places in the music: the point selected by the user as the beginning of a passage to play (the “passage marker”), and the point that is currently being played (the “play location marker”). The passage marker is noted by the device to indicate, for example, a measure that is being played. The play location marker is described herein as the cursor. Preferably, the user can choose to show the play location marker (cursor) only, since the back arrow button moves the play location marker to the passage marker location, and since the position wheel moves both markers at once.
Generally, the play location marker will not change appearance during playback. Notes on the display will change color or brightness when they are sounding during playback.
Controls affecting positioning are the passage marker wheel, the play button, and the back arrow button. The two buttons move the play location marker without affecting the passage marker; the wheel moves both markers.
Tempo
When the unit enters play mode, the default tempo appears on the screen near its wheel in the form of a metronome marking 142. The tempo can be changed at any time.
Product Controls
Reset Button 130
Restarts the operational sequence, as if power had just been applied. The display shows the list of pieces on the music card drive.
Next Button 130
Selects the highlighted item from the list controlled by the menu wheel and moves to the next step. Once the menu sequence completes, the next button becomes ineffective. Until the reset button is pushed or power is cycled, the menu wheel continues to control tempo, and any changes are reflected immediately.
Menu Wheel
File selection
See “Menu sequence”, above.
Timbre selection
See “Menu sequence”, above.
Tempo Wheel 118
The tempo wheel 118 controls the playback tempo of the music passage.
Passage Marker Wheel
Control of the passage marker is through the menu wheel. Preferred operation of the menu wheel in Passage Marker mode is as follows:
If the unit is playing when the wheel is moved, the unit stops.
Volume Wheel
Control of the playback volume is through the volume wheel.
Play Button 126
Commences play at the play location marker and moves the marker.
Stop Button 128
Stops play. Leaves the play location marker where it is.
Back Arrow Button (Move to Beginning of Passage)
Stops play, if appropriate, and moves the play location marker to the passage marker.
Meta Controls
Screen
The size and shape of the screen can be set at runtime by resizing the window on which it and the emulated product controls appear. Meta controls may be added to set limits to resolution, brightness, and contrast.
Files
The device operates on standard MusicXML files, and also can process condensed or compressed forms of those files.
Music Card Flash Drive
The flash drive interface 106 accepts memory cards in various formats. In other words, a meta control is used to select a directory containing MusicXML files. The device then displays those files at the beginning of each menu sequence. If the directory representing the flash drive contains subdirectories, they are not displayed. If a real flash drive is inserted into the machine, it can be selected.
The present invention has been described above in terms of a presently preferred embodiment so that an understanding of the present invention can be conveyed. There are, however, many configurations for music score capture and presentation systems not specifically described herein but with which the present invention is applicable. The present invention should therefore not be seen as limited to the particular embodiments described herein, but rather, it should be understood that the present invention has wide applicability with respect to music score capture and presentation generally. All modifications, variations, or equivalent arrangements and implementations that are within the scope of the attached claims should therefore be considered within the scope of the invention.
This application claims the benefit of priority of co-pending U.S. Provisional Patent Application Ser. No. 60/636,465 entitled “Sheet Music Synthesized Performance, Presentation, and Playback System and Method”, by Robert Taub filed Dec. 15, 2004. Priority of the filing date of Dec. 15, 2004 is hereby claimed, and the disclosure of the Provisional Patent Application is hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
60636465 | Dec 2004 | US |