The present disclosure generally relates to computer-implemented methods and systems. More specifically the present disclosure relates to a computer-implemented method for identifying a song user wants to perform and a system thereof.
This section illustrates useful background information without admission of any technique described herein representative of the state of the art.
In a conventional solution, the user selects a song and then operates the user interface to so that the solution, such as a mobile application, provides the user with the musical notation of the song or start playing the song or backing track of the song. The user can then play along with the backing track.
Playing a song is however often a personal, emotional and expressive experience. Ideally, the player knows the song by heart and is able to just play from the heart and sing. The next best thing is playing with the aid of lyrics and chord charts or tablature. Things that distract from this experience are e.g. scrolling through lyrics while playing causing the user to remove their hand from the instrument, or browsing through lists of songs necessitating the user to switch switching from play mindset to analyze mindset.
However, automatic detection of the song from a freely performed user performance, such as a user playing an instrument at their own tempo and choice of harmonies without adhering to the details of a predefined arrangement/notation of the song is difficult. When playing freely, the performer is usually reading chord-based music notation that describes the harmonic progression of the song with chord names, chord symbols, or chord diagrams without detailing the note-by-note arrangement of the song or the rhythmic patterns. Playing freely is very common when playing the part of an accompanying instrument in a song. The challenge in detecting a specific song from such free playing is that the amount of information contained in a few chords is typically not sufficient to disambiguate the song from among a large collection of songs.
The appended claims define the scope of protection. Any examples and technical descriptions of apparatuses, products and/or methods in the description and/or drawings not covered by the claims are presented not as embodiments of the invention but as background art or examples useful for understanding the invention.
One goal of the invention is that a user utilizing the system and method according to the present disclosure does not need to click through an application user interface or menus in order to choose a song or songs that the user wants to perform. Instead, the user can simply start playing, and the system and method recognize the song from the user play history and current user play performance, which may be used to display written music and/or lyrics for the user automatically. Further, the system and method may provide backing track audio pertaining to the recognized song at the tempo of the user, and at the position where the user is in the song.
According to a first example aspect there is provided computer-implemented method for identifying a song comprising:
According to a second example aspect there is provided a system or apparatus comprising:
The apparatus may be or comprise a mobile phone.
The apparatus may be or comprise a smart watch.
The apparatus may be or comprise a tablet computer.
The apparatus may be or comprise a laptop computer.
The apparatus may be or comprise a smart watch.
The apparatus may be or comprise a tablet computer.
The apparatus may be or comprise a laptop computer.
The apparatus may comprise a smart instrument amplifier, such as a smart guitar amplifier.
The apparatus may comprise a smart speaker, such as a virtual assistant provided speaker.
The apparatus may be or comprise a desktop computer.
The apparatus may be or comprise a computer.
According to a third example aspect there is provided a computer program comprising computer executable program code which when executed by at least one processor causes an apparatus at least to perform the method of the first example aspect.
According to a fourth example aspect there is provided a computer program product comprising a non-transitory computer readable medium having the computer program of the third example aspect stored thereon.
According to a fifth example aspect there is provided an apparatus comprising means for performing the method of the first example aspect.
Any foregoing memory medium may comprise a digital data storage such as a data disc or diskette; optical storage; magnetic storage; holographic storage; opto-magnetic storage; phase-change memory; resistive random-access memory; magnetic random-access memory; solid-electrolyte memory; ferroelectric random-access memory; organic memory; or polymer memory. The memory medium may be formed into a device without other substantial functions than storing memory or it may be formed as part of a device with other functions, including but not limited to a memory of a computer; a chip set; and a sub assembly of an electronic device.
The expression “a number of” refers herein to any positive integer starting from one (1), e.g. to one, two, or three.
The expression “a plurality of” refers herein to any positive integer starting from two (2), e.g. to two, three, or four.
Different non-binding example aspects and embodiments have been illustrated in the foregoing. The embodiments in the foregoing are used merely to explain selected aspects or steps that may be utilized in different implementations. Some embodiments may be presented only with reference to certain example aspects. It should be appreciated that corresponding embodiments may apply to other example aspects as well.
Some example embodiments will be described with reference to the accompanying figures, in which:
In the following description, like reference signs denote like elements or steps.
The communication interface 210 comprises in an embodiment a wired and/or wireless communication circuitry, such as Ethernet; Wireless LAN; Bluetooth; GSM; CDMA; WCDMA; LTE; and/or 5G circuitry. The communication interface can be integrated in the apparatus 200 or provided as a part of an adapter, card, or the like, that is attachable to the apparatus 200. The communication interface 210 may support one or more different communication technologies. The apparatus 200 may also or alternatively comprise more than one of the communication interfaces 210.
In this document, a processor may refer to a central processing unit (CPU); a microprocessor; a digital signal processor (DSP); a graphics processing unit; an application specific integrated circuit (ASIC); a field programmable gate array; a microcontroller; or a combination of such elements.
The user interface 230 may comprise a circuitry for receiving input from a user of the apparatus 200, e.g., via a keyboard; graphical user interface shown on the display of the apparatus 200; speech recognition circuitry; or an accessory device; such as a microphone, headset, or a line-in audio 250 connection for receiving the performance audio signal; and for providing output to the user via, e.g., a graphical user interface or a loudspeaker.
The memory 240 comprises a work memory and a persistent memory configured to store computer program code and data. The memory 240 may comprise any one or more of: a read-only memory (ROM); a programmable read-only memory (PROM); an erasable programmable read-only memory (EPROM); a random-access memory (RAM); a flash memory; a data disk; an optical storage; a magnetic storage; a smart card; a solid-state drive (SSD); or the like. The apparatus 200 may comprise a plurality of the memories 240. The memory 240 may be constructed as a part of the apparatus 200 or as an attachment to be inserted into a slot; port; or the like of the apparatus 200 by a user or by another person or by a robot. The memory 240 may serve the sole purpose of storing data or be constructed as a part of an apparatus 200 serving other purposes, such as processing data.
A skilled person appreciates that in addition to the elements shown in
The method may further comprise any one or more of:
The user performance may comprise, for example, playing of a plurality of opening chords of a song or some other position or part of a song. Additionally, user play history is combined with the detected user performance so that a song corresponding to the user performance may be more effectively detected from a large set of songs. The mobile apparatus is provided with user play history, song information and backing track audio data from an external server or cloud arrangement. As mentioned, such song information may comprise notation of the song but also further information such as practice information or alternative versions of the song and/or various types of visualization of the song or user performance thereof. Alternatively, the mobile apparatus may store at least part of the user play history, song information and backing track data. This way the user may utilize the system and method by very little user intervention or operation of the user interface and focus on free expression on their instrument, which is automatically detected and matched with a song to which song information may then be provided automatically for example to accompany the user's playing or help them play or practice the song.
Many tempo estimation techniques are known and may be used since it is a widely discussed topic in prior art. Examples of estimating user activity, playing position and tempo are discussed hereinbelow, which are all obtained by analyzing the performance audio signal in real time:
Activity features indicate when the user is actually playing as opposed to momentarily not producing any sounding notes from the instrument. The latter can be due to any reason, such as a rest (silent point) in the rhythmic pattern applied, or due to the performer pausing her performance. Accordingly, activity features play two roles in our system: 1) They allow weighting the calculated likelihoods of different chords in such a way that more importance is given to time points in the performance where the performer actually plays something (that is, where performance information is present). 2) Activity features allow the method to keep the estimated position fixed when the performer pauses and continue moving the position forward when performance resumes. For amateur performers, it is not uncommon to hesitate and even stop for a moment to figure out a hand position on the instrument, for example. Also, when performing at home, it is not uncommon to pause performing for a while to discuss with another person, for example. More technically, activity features describe in an embodiment the probability of any notes sounding in a given audio segment: p(NotesSounding|AudioSegment(t)) as a real number between 0 and 1.
Tonal features monitor the pitch content of the user's performance. As described above, when performing from a lead sheet, we do not know in advance the exact notes that the user will play nor their timing: the arrangement/texture of the music is unknown in advance. For that reason, we instead employ an array of models that represent different chords that may appear in the lead sheets. The models allow calculating a “match” or “score” for those chords: the likelihood that the corresponding chord is sounding in a given segment of the performance audio. Note that the system can be even totally agnostic about the component notes of each chord—for example when the model for each chord is trained from audio data, giving it examples where the chord is/is not sounding. Tonality feature vector is obtained by calculating a match between a given segment of performance audio and all the unique chords that occur in the song. More technically: probabilities of different chords sounding in a given an audio segment t: p(Chord(i)|AudioSegment(t)), where the chord index i=1, 2, . . . , <number of unique chords in the song>. Tonality features help us to estimate the probability for the performer to be at different parts of the song. Amateur performers sometimes jump backward in the performance to repeat a short segment or to fix a performance mistake. Also jumps forward are possible. Harmonic content of the user's playing allows the method to “anchor” the users position in the song even in the presence of such jumps.
Tempo features is used to estimate the tempo (or, playing speed) of the performer in real time. In many songs, there are segments where the chord does not change for a long time. Within such segments, the estimated tempo of the user drives the performer's position forward. In other words, even in the absence of chord changes (harmonic changes), having an estimate of the tempo of the user allows us to keep updating the performer's position. More technically: probabilities of different tempos (playing speeds) given the performance audio segment t, p(Tempo(j)|AudioSegment0, 1, 2, . . . , t)), where index j covers all tempo values between a minimum and maximum tempo of interest.
By combining information from the above-mentioned three features, and backing track information, we can tackle the various challenges in tracking the position x(t) and playing tempo of an amateur performer and set a backing track corresponding to the position and playing tempo wherein:
Any of the above-described methods, method steps, or combinations thereof, may be controlled or performed using hardware; software; firmware; or any combination thereof. The software and/or hardware may be local; distributed; centralized; virtualized; or any combination thereof. Moreover, any form of computing, including computational intelligence, may be used for controlling or performing any of the afore described methods, method steps, or combinations thereof. Computational intelligence may refer to, for example, any of artificial intelligence; neural networks; fuzzy logics; machine learning; genetic algorithms; evolutionary computation; or any combination thereof.
Various embodiments have been presented. It should be appreciated that in this document, words comprise; include; and contain are each used as open-ended expressions with no intended exclusivity.
The foregoing description has provided by way of non-limiting examples of particular implementations and embodiments a full and informative description of the best mode presently contemplated by the inventors for carrying out the invention. It is however clear to a person skilled in the art that the invention is not restricted to details of the embodiments presented in the foregoing, but that it can be implemented in other embodiments using equivalent means or in different combinations of embodiments without deviating from the characteristics of the invention.
Furthermore, some of the features of the afore-disclosed example embodiments may be used to advantage without the corresponding use of other features. As such, the foregoing description shall be considered as merely illustrative of the principles of the present invention, and not in limitation thereof. Hence, the scope of the invention is only restricted by the appended patent claims.
Number | Date | Country | Kind |
---|---|---|---|
20227059 | Apr 2022 | FI | national |