The subject matter disclosed herein generally relates to the technical field of special-purpose machines that perform audio processing, including computerized variants of such special-purpose machines and improvements to such variants, and to the technologies by which such special-purpose machines become improved compared to other special-purpose machines that perform audio processing. Specifically, the present disclosure addresses systems and methods to facilitate characterizing audio using transchromagrams.
Tonality, the harmonic and melodic structure of musical notes, is a core element of music. Chromagrams, which can be represented using data structures, can be used as audio signal processing inputs in the computational extraction of frequency information, such as tonality information. A chromagram can be generated (e.g., calculated) by performing, for example, a Constant Q Transform (CQT), a Fourier Transform, etc. of a time window (e.g., a time frame) of an audio signal and then mapping the energies of the transform into various ranges of frequencies (e.g., a high band, a middle band, a low band, etc.).
Some examples are illustrated by way of example and not limitation in the figures of the accompanying drawings.
The figures are not to scale. Wherever possible, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts. Connecting lines or connectors shown in the various figures presented are intended to represent example functional relationships and/or physical or logical couplings between the various elements.
Tonality, the harmonic and melodic structure of musical notes, is a core element of music, but the problem of using computational methods to reliably extract this information from audio remains unsolved. There has been some limited work done in configuring a machine (e.g., a musical information retrieval machine) to perform identification of the musical chords of a song or the musical key of the song, but existing efforts do not provide broad usefulness across the full musical landscape. Accordingly, a musician trying to play along with a randomly chosen song using harmony information (e.g., musical key or chords) obtained using the current computational extraction methods, will likely experience frustration at the accuracy of the harmony information.
Music may be characterized by mapping energy of the music in a time window into various ranges of frequencies (e.g., a high band, a middle band, and a low band). Similar mappings may be performed for multiple time windows (e.g., a series of time frames) within a song. These mappings can be combined (e.g., grouped) together to represent (e.g., model or otherwise indicate) the energies in the frequency ranges over time within the audio signal. Various preprocessing operations, post-processing operations, or both, can be applied to the combined mappings to remove non-tonal energies and align the represented energies into their respective frequency ranges. From this point, example computational extractions of frequency information can apply some metric to quantify similarity between time frames of different chromagrams.
Disclosed example machines (e.g., an audio processor machine) may be configured to interact with one or more users to provide information regarding an audio signal or audio content thereof (e.g., in response to a user-submitted request for such information). In some examples, such information may identify the audio content, characterize (e.g., describe) the audio content, identify similar audio content (e.g., as suggestions or recommendations), or any suitable combination thereof. For example, the machine may perform audio fingerprinting to identify audio content (e.g., by comparing a query fingerprint of an excerpt of the audio content against one or more reference fingerprints stored in a database). The machine may perform such operations as part of providing an audio matching service to one or more client devices. In some examples, the machine may interact with one or more users by identifying or describing audio content and providing notifications of the results of such identifications and descriptions to one or more users (e.g., in response to one or more requests). Such a machine may be implemented in a server system (e.g., a network-based cloud of one or more server machines), a client device (e.g., a portable device, an automobile-mounted device, an automobile-embedded device, or other mobile device), or any suitable combination thereof.
Some disclosed example methods (e.g., algorithms) facilitate characterization of audio using a transchromagram and related tasks, and example systems (e.g., special-purpose machines) are configured to facilitate such characterization and related tasks. Broadly speaking, a transchromagram represents the likelihood(s) that a particular note in a piece of music will be followed by another particular note. For example, when an A note is being played it is most likely that a D note and then an E note will follow. Thus, a transchromagram represents the dynamic nature (e.g., how a piece of music changes, evolves, etc. over time) of a piece of music. Because different pieces of music have different progressions of notes (e.g., different melody lines, etc.), they will have different transchromagrams, and, thus, transchromagrams can be used to distinguish one piece of music from another. Examples merely typify possible variations. Unless explicitly stated otherwise, structures (e.g., structural components, such as modules) are optional and may be combined and/or subdivided, and operations (e.g., in a procedure, algorithm, or other function) may vary in sequence, be combined, and/or subdivided. In the following description, for purposes of explanation, numerous specific details are set forth to provide a thorough understanding of various examples. It will be evident to one skilled in the art, however, that the present subject matter may be practiced without these specific details.
In a chromagram, each time frame represents, models, encodes, or otherwise indicates energies at various frequencies (e.g., in various frequency ranges that each represent a different musical note) in one time period of the audio content (e.g., within one time frame of a song). However, a chromagram contains no representation of how frequencies (e.g., frequency bins that represent musical notes) change within that period of time. Although it is possible to calculate how notes change from one time frame to another within the chromagram, calculating a similarity metric between two sequential time frames would involve only sequential instantaneous harmonies (e.g., sequential instantaneous combinations of musical notes). Thus, although some tonal information is captured by a chromagram, analyzing chromagrams may ignore tonalities defined by sequential notes (e.g., sequences of notes or sequences of note combinations), which are quite common in music. As a result, chromagram analysis may be vulnerable to overreliance on contemporaneous (e.g., instantaneous) harmonic structure.
However, in music, tonality is defined not only by how multiple contemporaneous notes sound together but also by how they relate to other notes in time. For example, a leading tone is a note that leads the listener's ear to a different note, often resolving some tonally defined musical tension (e.g., musically driven emotional tension). The technique can be employed with multiple notes sounding one after the other and not at the same time (e.g., with no two notes being played within the same time frame). In music theory, musicologists refer to this phenomenon as functional harmony.
In some examples, a transchromagram is a data structure (e.g., a matrix) that can be used to characterize audio data. Furthermore, such characterization may be a basis on which to identify, classify, analyze, represent, or otherwise describe the audio data, and transchromagrams of various audio data can be compared or otherwise analyzed for similarity to identify, select, suggest, or recommend audio data of varying degrees of similarity (e.g., identical, nearly identical, tonally similar, having similar structures, having similar genres, or having a similar moods).
A transchromagram can be derived from any time-domain data, such as audio data that encodes or otherwise represents audio content (e.g., music, such as a song, or noise, such as rhythmic machine-generated noise). As applied to music, a transchromagram can be conceptually described as a probabilistic note transition matrix derived from audio data. Since a chromagram of an audio signal can indicate energies of musical notes (e.g., energies in various frequency bins that each represent a different musical note) as the notes occur over time (e.g., across multiple sequential overlapping or non-overlapping time frames of the audio signal), a transchromagram can be derived from the chromagram of the audio signal.
As discussed in greater detail below, a suitably and specially configured machine generates example transchromagrams by accessing the chromagrams of an audio signal, generating a set of transition matrices based on the chromagrams, with each transition matrix being generated based on a different pair of time frames in the chromagrams, and generating the transchromagram based on the transition matrices (e.g., by averaging or otherwise combining the transition matrices). The machine may be configured to store the transchromagram as metadata of the audio data and use the transchromagram to characterize the audio data (e.g., by characterizing at least the time frames analyzed), and multiple transchromagrams can be compared by the machine (e.g., during similarity analysis) to computationally detect tonally matching, tonally similar, or tonally complementary audio data. In various examples, such a machine may also be configured to perform musical key detection, detection of changes in musical key, musical chord detection, musical genre detection, song identification, derivative song detection (e.g., a live cover version of a studio-recorded song), song structure detection (e.g., detection of AABA structure or other musical patterns), copyright infringement analysis, or any suitable combination thereof.
Reference will now be made in detail to non-limiting examples of this disclosure, examples of which are illustrated in the accompanying drawings. The examples are described below by referring to the drawings.
Also shown in
Any of the example systems or machines (e.g., databases and devices) shown in
In some examples, a database is a data storage resource and may store data structured as a text file, a table, a spreadsheet, a relational database (e.g., an object-relational database), a triple store, a hierarchical data store, or any suitable combination thereof. Moreover, any two or more of the systems or machines illustrated in
The example network 190 of
As shown in
While an example manner of implementing the example audio processor machine 110 of
As shown in
While an example manner of implementing the example device 130 of
Any one or more of the components (e.g., modules) described herein may be implemented using hardware alone (e.g., one or more of the processors 299), or a combination of hardware and software. For example, any component described herein may physically include an arrangement of one or more of the example processors 299 (e.g., a subset of or among the processors 299) configured to perform the operations described herein for that component. As another example, any component described herein may include software, hardware, or both, that configure an arrangement of one or more of the processors 299 to perform the operations described herein for that component. Accordingly, different components described herein may include and configure different arrangements of the processors 299 at different points in time or a single arrangement of the processors 299 at different points in time. Each component (e.g., module) described herein is an example of a means for performing the operations described herein for that component. Moreover, any two or more components described herein may be combined into a single component, and the functions described herein for a single component may be subdivided among multiple components. Furthermore, according to various examples, components described herein as being implemented within a single system or machine (e.g., a single device) may be distributed across multiple systems or machines (e.g., multiple devices).
As shown in the example of
As shown in the example of
As further shown in
As an illustrative example of a single transition matrix,
According to some examples, a transition matrix (e.g., similar to the transition matrix 501) may be a three-dimensional (3D) transition matrix, and the transition matrices 500 contain 3D transition matrices. An example 3D transition matrix is generated from three time frames (e.g., the example sequential time frames 401, 402, and 403) of the example chromagram 420 and indicates probability values (e.g., similar to the example probability values 510) that quantify and specify probabilities of a first musical note (e.g., a starting note) transitioning to a second musical note (e.g., an intermediate note) and then transitioning to a third musical note (e.g., an ending note).
Similarly, according to certain examples, a transition matrix (e.g., similar to the transition matrix 501) may be a four-dimensional (4D) transition matrix, and the transition matrices 500 may contain 4D transition matrices. An example 4D transition matrix is generated from four time frames (e.g., the example sequential time frames 401, 402, 403, and 404) of the example chromagram 420 and indicates probability values (e.g., similar to the example probability values 510) that quantify and specify probabilities of a first musical note (e.g., a starting note) transitioning to a second musical note (e.g., a first intermediate note), then to a third musical note (e.g., a second intermediate note), and then to a fourth musical note (e.g., an ending note).
Furthermore, according to various examples, a transition matrix (e.g., similar to the example transition matrix 501) may be a five-dimensional (5D) transition matrix, and the transition matrices 500 may contain 5D transition matrices. An example 5D transition matrix is generated from five time frames (e.g., sequential time frames 401, 402, 403, 404, and 405) of the chromagram 420 and indicates probability values (e.g., similar to the example probability values 510) that quantify and specify probabilities of a first musical note (e.g., a starting note) transitioning to a second musical note (e.g., a first intermediate note), then to a third musical note (e.g., a second intermediate note), then to a fourth musical note (e.g., a third intermediate note), and then to a fifth musical note (e.g., an ending note). The present disclosure additionally contemplates transition matrices of even higher dimensionality (e.g., six-dimensional, seven-dimensional, eight-dimensional, etc. transition matrices).
As shown in
As noted above, in various examples, the resulting example transchromagram 600 of
In examples where the example transchromagram 600 is generated by combining 2D transition matrices, the transchromagram 600 may be described as a second-order transchromagram. Similarly, where the example transchromagram 600 is generated from 3D transition matrices, the transchromagram 600 may be described as a third-order transchromagram; where the example transchromagram 600 is generated from 4D transition matrices, the transchromagram 600 may be described as a fourth-order transchromagram; where the example transchromagram 600 is generated by combining 5D transition matrices, the transchromagram 600 may be described as a fifth-order transchromagram; and so on.
In this example, the machine-readable instructions comprise a program for execution by a processor such as the processor 1002 shown in the example processor platform 1000 discussed below in connection with
As mentioned above, the example processes of
In operation 710, the example chromagram accessor 220 of
In operation 720, the example transchromagram generator 230 of
As a 2D example, a first example 2D transition matrix (e.g., the example transition matrix 501) may be generated from a first pair of time frames (e.g., the example sequential time frames 401 and 402); a second example 2D transition matrix may be generated from a second pair of time frames (e.g., the example sequential time frames 402 and 403); a third example 2D transition matrix may be generated from a third pair of time frames (e.g., the example sequential time frames 403 and 404); and so on. As a 3D example, a first example 3D transition matrix (e.g., the example transition matrix 501) may be generated from a first trio of time frames (e.g., the example sequential time frames 401, 402, and 403); a second example 3D transition matrix may be generated from a second trio of time frames (e.g., the example sequential time frames 402, 403, and 404); a third example 3D transition matrix may be generated from a third trio of time frames (e.g., the example sequential time frames 403, 404, and 405); and so on. As a 4D example, a first example 4D transition matrix (e.g., the example transition matrix 501) may be generated from a first quartet of time frames (e.g., the example sequential time frames 401, 402, 403, and 404); a second example 4D transition matrix may be generated from a second quartet of time frames (e.g., the example sequential time frames 402, 403, 404, and 405); a third example 4D transition matrix may be generated from a third quartet of time frames (e.g., the example sequential time frames 403, 404, 405, and 406); and so on. Similarly, as a 5D example, different quintets (e.g., sequential quintets) of time frames may be used to generate each individual 5D transition matrix (e.g., transition matrix 501).
Higher-order transition matrices (e.g., the example transition matrices 500) are also contemplated, for example, with six-dimensional (6D) transition matrices (e.g., the example transition matrices 500) being generated from different sextets of time frames, with seven-dimensional (7D) transition matrices (e.g., the example transition matrices 500) being generated from different septets of time frames, with eight-dimensional (8D) transition matrices (e.g., the example transition matrices 500) being generated from different octets of time frames, with nine-dimensional (9D) transition matrices (e.g., the example transition matrices 500) being generated from different nonets of time frames, with ten-dimensional (10D) transition matrices (e.g., the example transition matrices 500) being generated from different dectets of time frames, and so on.
In operation 730, the example transchromagram generator 230 of
In operation 740, the example database controller 240 of
As shown in
In operation 810, the example chromagram accessor 220 of
In operation 812, the example chromagram accessor 220 generates (e.g., creates) the example chromagram 420 of the audio data 410 based on the transform calculated in operation 810. This may be performed in a manner similar to that described above with respect to
According to various examples, the chromagram 420 may represent a set of frequency ranges (e.g., musical note bins) that span one or more musical octaves. Thus, in some examples, an operation 813 is performed as part of operation 812. In operation 813, the generation of the chromagram 420 by the example chromagram accessor 220 includes representing both the fundamental frequencies of the audio data 410 and the overtone frequencies of the audio data 410 within one musical octave. Since a single musical octave may include twelve equal-tempered semitone notes, the frequency ranges of the chromagram 420 may partition the one musical octave into twelve equal-tempered semitone notes.
However, in some alternative examples, the set of frequency ranges spans two musical octaves, and an operation 814 is performed as part of operation 812. In operation 814, the generation of the chromagram 420 by the example chromagram accessor 220 of
According to certain examples, the energy values (e.g., energy values 421-426) indicated in the chromagram 420 are normalized prior to generation of the transition matrices 500 to be performed in operation 720. Thus, as shown in
As shown in
As also shown in
As further shown in
Since higher-order transition matrices (e.g., the transition matrices 500) are also contemplated, operations that are analogous to operations 820-824 are likewise contemplated for higher-order transition matrices. Such analogous operations may be included in operation 720, in which the example transchromagram generator 230 of
In some examples, in performing operation 730 to generate the transchromagram 600, the example transchromagram generator 230 of
As shown in
In some examples that include operation 900, the audio data 410 is or includes reference audio data. Also, the reference audio data may be identified (e.g., by the database 115) by a reference identifier (e.g., a filename or a song name) stored in metadata (e.g., within the database 115) that describes the reference audio data (e.g., the audio data 410). Additionally, the reference audio data may be in a reference musical key indicated by the metadata. Moreover, the reference audio data may contain a reference musical chord indicated by the metadata. According to some example implementations, the reference musical chord is an arpeggiated musical chord that includes multiple musical notes played one musical note at a time over multiple sequential time frames (e.g., time frames 401-403) of the reference audio data (e.g., audio data 410). Furthermore, the reference audio data may have a reference song structure of multiple sequential song segments (e.g., indicated as AABA, ABAB, or ABABCB), and the reference song structure may be indicated by the metadata that describes the reference audio data. Furthermore still, the reference audio data may exemplify a reference musical genre (e.g., blues, rock, march, or polka) indicated by the metadata that describes the reference audio data.
According to some examples that include operation 900, the transchromagram 600 is or includes a reference transchromagram correlated by the database 115 with the reference audio data (e.g., audio data 410) and its metadata (e.g., reference metadata). In this context, the comparison module 250 performs operation 900 using, for example, machine learning. Machine learning can be used to train, for example, a support vector machine, a vector quantizer, a recurrent neural network, a convolutional neural network, a gaussian mixer mode, etc. For example, a support vector machine may be trained to recognize (e.g., detect or identify) the reference audio data (e.g., the audio data 410) based on the reference transchromagram (e.g., the transchromagram 600). As another example, the support vector machine may be trained to recognize the reference musical key of the reference audio data based on the reference transchromagram. As yet another example, the support vector machine may be trained to recognize the reference musical chord contained in the reference audio data based on the reference transchromagram. As a further example, the support vector machine may be trained to recognize the reference song structure of the reference audio data based on the reference transchromagram. As a still further example, the support vector machine may be trained to recognize the reference musical genre of the reference audio data based on the reference transchromagram.
In some examples, the example comparison module 250 of
In operation 910, the example audio data accessor 210 of
For example, the query may include a request to identify the query audio data (e.g., audio data similar to the audio data 410), and the example audio data accessor 210 of
In operation 911, the example chromagram accessor 220 of
In operation 920, the example transchromagram generator 230 of
In operation 930, the example transchromagram generator 230 of
In operation 940, the example comparison module 250 of
In operation 950, the example notification manager 260 of
For example, in response to a request to identify the query audio data (e.g., similar to audio data 410), the example notification manager 260 of
In operation 960, the example database controller 240 of
For example, the example database controller 240 of
According to various examples, one or more of the methodologies described herein may facilitate characterization of audio, and in particular, characterization of audio data using transchromagrams. Moreover, one or more of the example methodologies described herein may facilitate identification of audio data based on transchromagrams, analysis of audio data based on transchromagrams, or any suitable combination thereof. Hence, one or more of the example methodologies described herein may facilitate provision of identification and analysis services regarding audio data, as well as convenient and efficient user service in providing notifications and performing maintenance of metadata for audio data, compared to capabilities of pre-existing systems and methods.
When these effects are considered in aggregate, one or more of the example methodologies described herein may obviate a need for certain efforts or resources that otherwise would be involved in characterization of audio, identification of audio, analysis of audio, or other computationally intensive audio processing tasks. Efforts expended by a user in such audio processing tasks may be reduced by use of (e.g., reliance upon) a special-purpose machine that implements one or more of the example methodologies described herein. Computing resources used by one or more systems or machines (e.g., within the example network environment 100 of
In some examples, the machine 1000 operates as a standalone device or may be communicatively coupled (e.g., networked) to other machines. In a networked deployment, the example machine 1000 of
The example machine 1000 of
The example machine 1000 of
The example data storage 1016 of
In some examples, the example machine 1000 of
As used herein, the term “memory” refers to a machine-readable medium able to store data temporarily or permanently and may be taken to include, but not be limited to, random-access memory (RAM), read-only memory (ROM), buffer memory, flash memory, and cache memory. While the example machine-readable medium 1022 of
Certain examples are described herein as including modules. Modules may constitute software modules (e.g., code stored or otherwise embodied in a machine-readable medium or in a transmission medium), hardware modules, or any suitable combination thereof. A “hardware module” is a tangible (e.g., non-transitory) physical component (e.g., a set of one or more processors) capable of performing certain operations and may be configured or arranged in a certain physical manner. In various examples, one or more computer systems or one or more hardware modules thereof may be configured by software (e.g., an application or portion thereof) as a hardware module that operates to perform operations described herein for that module.
In some examples, a hardware module may be implemented mechanically, electronically, hydraulically, or any suitable combination thereof. For example, a hardware module may include dedicated circuitry or logic that is permanently configured to perform certain operations. A hardware module may be or include a special-purpose processor, such as a field programmable gate array (FPGA) or an ASIC. A hardware module may also include programmable logic or circuitry that is temporarily configured by software to perform certain operations. As an example, a hardware module may include software encompassed within a CPU or other programmable processor. It will be appreciated that the decision to implement a hardware module mechanically, hydraulically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
Accordingly, the phrase “hardware module” should be understood to encompass a tangible entity that may be physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. Furthermore, as used herein, the phrase “hardware-implemented module” refers to a hardware module. Considering examples in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where a hardware module includes a CPU configured by software to become a special-purpose processor, the CPU may be configured as respectively different special-purpose processors (e.g., each included in a different hardware module) at different times. Software (e.g., a software module) may accordingly configure one or more processors, for example, to become or otherwise constitute a particular hardware module at one instance of time and to become or otherwise constitute a different hardware module at a different instance of time.
Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over circuits and buses) between or among two or more of the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory (e.g., a memory device) to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information from a computing resource).
The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions described herein. As used herein, “processor-implemented module” refers to a hardware module in which the hardware includes one or more processors. Accordingly, the operations described herein may be at least partially processor-implemented, hardware-implemented, or both, since a processor is an example of hardware, and at least some operations within any one or more of the methods discussed herein may be performed by one or more processor-implemented modules, hardware-implemented modules, or any suitable combination thereof.
Moreover, such one or more processors may perform operations in a “cloud computing” environment or as a service (e.g., within a “software as a service” (SaaS) implementation). For example, at least some operations within any one or more of the methods discussed herein may be performed by a group of computers (e.g., as examples of machines that include processors), with these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., an application program interface (API)). The performance of certain operations may be distributed among the one or more processors, whether residing only within a single machine or deployed across a number of machines. In some examples, the one or more processors or hardware modules (e.g., processor-implemented modules) may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other examples, the one or more processors or hardware modules may be distributed across a number of geographic locations.
Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and their functionality presented as separate components and functions in example configurations may be implemented as a combined structure or component with combined functions. Similarly, structures and functionality presented as a single component may be implemented as separate components and functions. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
Some portions of the subject matter discussed herein may be presented in terms of algorithms or symbolic representations of operations on data stored as bits or binary digital signals within a memory (e.g., a computer memory or other machine memory). Such algorithms or symbolic representations are examples of techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. As used herein, an “algorithm” is a self-consistent sequence of operations or similar processing leading to a desired result. In this context, algorithms and operations involve physical manipulation of physical quantities. Typically, but not necessarily, such quantities may take the form of electrical, magnetic, or optical signals capable of being stored, accessed, transferred, combined, compared, or otherwise manipulated by a machine. It is convenient at times, principally for reasons of common usage, to refer to such signals using words such as “data,” “content,” “bits,” “values,” “elements,” “symbols,” “characters,” “terms,” “numbers,” “numerals,” or the like. These words, however, are merely convenient labels and are to be associated with appropriate physical quantities.
Unless specifically stated otherwise, discussions herein using words such as “accessing,” “processing,” “detecting,” “computing,” “calculating,” “determining,” “generating,” “presenting,” “displaying,” or the like refer to actions or processes performable by a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or any suitable combination thereof), registers, or other machine components that receive, store, transmit, or display information. Furthermore, unless specifically stated otherwise, the terms “a” or “an” are herein used, as is common in patent documents, to include one or more than one instance. Finally, as used herein, the conjunction “or” refers to a non-exclusive “or,” unless specifically stated otherwise.
The following enumerated embodiments describe various examples of methods, machine-readable media, and systems (e.g., machines, devices, or other apparatus) discussed herein.
A first example is a method comprising:
generating, by executing one or more instructions on a processor, a set of transition matrices based on a plurality of time frames of the audio data, each of the plurality of transition matrices generated based on a different pair of time frames in the plurality of time frames, and indicating probabilities that anterior musical notes in an anterior time frame of the pair transition to posterior musical notes in a posterior time frame of the pair;
generating, by executing one or more instructions on a processor, a data structure representing how the audio data changes statistically between the plurality of time frames based on the set of transition matrices; and causing, by executing one or more instructions on a processor, a database to store the data structure within metadata that describes the audio data.
A second example is the example method of example 1, wherein the data structure includes a transchromagram.
A third example is the example method of the second example, further including accessing, by executing one or more instructions on a processor, a chromagram of audio data, the chromagram indicating energy values that occur in corresponding time frames of the audio data at corresponding frequency ranges that partition a set of musical octaves into musical notes that are each represented by a different frequency range among the frequency ranges, the transchromagram a transchromagram of the chromagram.
A fourth example is the example method of the second example, wherein the generating of the transchromagram includes generating a mean transition matrix by averaging the generated set of transition matrices, the generated transchromagram including the generated mean transition matrix.
A fifth example is the method of any of the first example to the fourth example, wherein the generating of the set of transition matrices includes generating a two-dimensional transition matrix based on a pair of time frames selected from the plurality of time frames of the audio data.
A sixth example is the method of the fifth example, wherein the pair of time frames is a sequential pair of adjacent time frames within the audio data, and the generated two-dimensional transition matrix indicates (e.g., by inclusion) a probability of a first musical note transitioning to a second musical note during the sequential pair of adjacent time frames.
A seventh example is any of the method of the first example to the sixth example, wherein the generating of the set of transition matrices includes generating a three-dimensional transition matrix based on a trio of time frames selected from the plurality of time frames of the audio data.
A eight example is the method of the seventh embodiment, wherein the trio of time frames is a sequential trio of consecutive time frames within the audio data, and the generated three-dimensional transition matrix indicates (e.g., by inclusion) a probability of a first musical note transitioning to a second musical note and then transitioning to a third musical note during the sequential trio of consecutive time frames.
A ninth example is the method of any of the first example to the eighth example, wherein the generating of the set of transition matrices includes generating a four-dimensional transition matrix based on a quartet of time frames selected from the plurality of time frames of the audio data.
An tenth example is the method of the ninth example, wherein the quartet of time frames is a sequential quartet of consecutive time frames within the audio data, and the generated four-dimensional transition matrix indicates (e.g., by inclusion) a probability of a first musical note transitioning to a second musical note, then transitioning to a third musical note, and then transitioning to a fourth musical note during the sequential quartet of consecutive time frames.
A eleventh example is the method of any of the first through the tenth examples, further comprising normalizing the energy values of the accessed chromagram, the normalized energy values ranging between zero and unity; and wherein the generating of the set of transition matrices is based on the normalized energy values that range between zero and unity.
A twelfth example is the method of any of the first through the eleventh examples, wherein the audio data is reference audio data identified by a reference identifier stored in the metadata that describes the reference audio data, the transchromagram is a reference transchromagram correlated by the database with the reference audio data, and the method further comprises causing a support vector machine to be trained via machine-learning to recognize the reference audio data based on the reference transchromagram, receiving query audio data to be identified, generating a query transchromagram based on the query audio data, and causing a device to present a notification that the query audio data is identified by the reference identifier based on a comparison of the query transchromagram to the reference transchromagram.
An thirteenth example is the method of any of the first through the twelfth examples, wherein the audio data is reference audio data in a reference musical key indicated by the metadata that describes the reference audio data, the transchromagram is a reference transchromagram correlated by the database with the reference audio data, and the method further comprises causing a support vector machine to be trained via machine-learning to detect the reference musical key based on the reference transchromagram, receiving query audio data to be analyzed, generating a query transchromagram based on the query audio data, and causing a device to present a notification that the query audio data is in the reference musical key based on a comparison of the query transchromagram to the reference transchromagram.
A fourteenth example is the method of any of the first through the thirteenth examples, wherein the audio data is reference audio data that contains a reference musical chord indicated by the metadata that describes the reference audio data, the transchromagram is a reference transchromagram correlated by the database with the reference musical chord, and the method further comprises causing a support vector machine to be trained via machine-learning to detect the reference musical chord based on the reference transchromagram, receiving query audio data to be analyzed, generating a query transchromagram based on the query audio data, and causing a device to present a notification that the query audio data contains the reference musical chord based on a comparison of the query transchromagram to the reference transchromagram.
A fifteenth example is the method of the fourteenth example, wherein the reference musical chord is an arpeggiated musical chord that includes multiple musical notes played one musical note at a time over multiple sequential time frames of the reference audio data.
A sixteenth example is the method of any of the first through the fifteenth examples, wherein the audio data is reference audio data that has a reference song structure of multiple sequential song segments, the reference song structure being indicated by the metadata that describes the reference audio data, the transchromagram is a reference transchromagram correlated by the database with the reference song structure, and the method further comprises causing a support vector machine to be trained via machine-learning to detect the reference song structure based on the reference transchromagram, receiving query audio data to be analyzed, generating a query transchromagram based on the query audio data, and causing a device to present a notification that the query audio data has the reference song structure based on a comparison of the query transchromagram to the reference transchromagram.
A seventeenth example is the method of any of the first through the sixteenth examples, wherein the audio data is reference audio data that exemplifies a reference musical genre indicated by the metadata that describes the reference audio data, the transchromagram is a reference transchromagram correlated by the database with the reference musical genre, and the method further comprises, causing a support vector machine to be trained via machine-learning to detect the reference musical genre based on the reference transchromagram, receiving query audio data to be analyzed, generating a query transchromagram based on the query audio data, and causing a device to present a notification that the query audio data exemplifies the reference musical genre based on a comparison of the query transchromagram to the reference transchromagram.
A eighteenth example is the method of any of the first through the seventeenth examples, further comprising calculating a constant Q transform of the audio data, and creating the chromagram of the audio data based on the constant Q transform of the audio data.
A nineteenth example is a machine-readable medium (e.g., a non-transitory machine-readable storage medium) comprising instructions that, when executed by one or more processors of a machine, cause the machine to perform operations comprising accessing a chromagram of audio data, the chromagram indicating energy values that occur in corresponding time frames of the audio data at corresponding frequency ranges that partition a set of musical octaves into musical notes that are each represented by a different frequency range among the frequency ranges, generating a set of transition matrices based on a plurality of the time frames of the audio data, each transition matrix in the set being generated based on a different pair of time frames in the plurality and indicating probabilities that anterior musical notes in an anterior time frame of the pair transition to posterior musical notes in a posterior time frame of the pair, generating a transchromagram of the chromagram based on the set of transition matrices generated based on the plurality of the time frames of the audio data, and causing a database to store the transchromagram of the chromagram within metadata that describes the audio data.
An twentieth examples is the example machine-readable medium of the nineteenth example, wherein the operations further comprise calculating a constant Q transform of the audio data, and generating the chromagram of the audio data based on the constant Q transform of the audio data, and wherein the generating of the chromagram includes representing fundamental frequencies of the audio data and overtone frequencies of the audio data within two musical octaves, and the frequency ranges of the chromagram partition the two musical octaves into twenty-four equal-tempered semitone notes.
A twenty-first example is a system comprising one or more processors, and a memory storing instructions that, when executed by at least one processor among the one or more processors, cause the system to perform operations comprising accessing a chromagram of audio data, the chromagram indicating energy values that occur in corresponding time frames of the audio data at corresponding frequency ranges that partition a set of musical octaves into musical notes that are each represented by a different frequency range among the frequency ranges, generating a set of transition matrices based on a plurality of the time frames of the audio data, each transition matrix in the set being generated based on a different pair of time frames in the plurality and indicating probabilities that anterior musical notes in an anterior time frame of the pair transition to posterior musical notes in a posterior time frame of the pair, generating a transchromagram of the chromagram based on the set of transition matrices generated based on the plurality of the time frames of the audio data, and causing a database to store the transchromagram of the chromagram within metadata that describes the audio data.
A twenty-second example is the system of the twenty-first example, wherein the operations further comprise, calculating a constant Q transform of the audio data and generating the chromagram of the audio data based on the constant Q transform of the audio data, and wherein the generating of the chromagram includes representing fundamental frequencies of the audio data and overtone frequencies of the audio data within one musical octave, and the frequency ranges of the chromagram partition the one musical octave into twelve equal-tempered semitone notes.
A twenty-third example is an apparatus including
A twenty-fourth example is the apparatus of the twenty-third example, wherein the transchromagram generator generates the transchromagram by generating a mean transition matrix by averaging the generated set of transition matrices, the generated transchromagram including the generated mean transition matrix.
A twenty-fifth example is the apparatus of the twenty-third example, wherein the transchromagram generator generates the set of transition matrices by generating a transition matrix based on one or more time frames selected from the plurality of time frames of the audio data.
A twenty-sixth embodiment includes a non-transitory computer-readable storage medium carrying machine-readable instructions for controlling a machine to carry out any of the previously described examples.
This application claims the priority benefit of U.S. Provisional Patent Application No. 62/381,801, entitled “Characterizing Audio Using a Data Structure,” and filed on Aug. 31, 2016, the entirety of which is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
7910819 | Van De Par et al. | Mar 2011 | B2 |
8069036 | Pauws et al. | Nov 2011 | B2 |
8158870 | Lyon et al. | Apr 2012 | B2 |
9055376 | Postelnicu et al. | Jun 2015 | B1 |
9183849 | Neuhauser et al. | Nov 2015 | B2 |
9195649 | Neuhauser et al. | Nov 2015 | B2 |
9257111 | Sumi | Feb 2016 | B2 |
20100198760 | Maddage et al. | Aug 2010 | A1 |
20110314995 | Lyon | Dec 2011 | A1 |
20130297297 | Guven | Nov 2013 | A1 |
20140330556 | Resch | Nov 2014 | A1 |
20150302086 | Roberts et al. | Oct 2015 | A1 |
20160196343 | Rafii | Jul 2016 | A1 |
20170024615 | Allen | Jan 2017 | A1 |
Number | Date | Country | |
---|---|---|---|
20180061382 A1 | Mar 2018 | US |
Number | Date | Country | |
---|---|---|---|
62381801 | Aug 2016 | US |